This listing of claims will replace all prior versions, and listings, of claims in the Application: 



LISTING OF CLAIMS : 

Claim 1 (currently amended): A method for determining a genotype associated with 
increased or d e cr e ased resistance to familial bipolar affective disorder in a family affected by 
bipolar affective disorder, comprising: 

determining the genotype of at least one family member, wherein the genotype 
is determined with at least one marker for at least one chromosomal region linked to a locus 
associated with resistance to bipolar affective disorder, wherein the chromosomal regions are 
inclusive of and localized between D4S402 and D4S424\ inclusive of and localized between 
D4S431 and D4S404\ or inclusive and localized between D11S394 and D11S29; 

determining, after the age of onset, the bipolar affective disorder disease status 
in the family member; 

comparing the genotype with the bipolar affective disorder disease status; and 

determining therefrom the genotype associated with increased or d e cr e as e d 
resistance to bipolar affective disorder. 

Claim 2 (original): The method of claim 1, wherein the genotype is determined with markers 
for at least two of the chromosomal regions. 

Claim 3 (original): The method of claim 2, wherein the genotype is determined with markers 
for three of the chromosomal regions. 

Claim 4 (original): The method of claim 1, wherein the chromosomal region is inclusive of 
and localized between markers D4S422 and D4S1625. 

Claim 5 (original): The method of claim 4, wherein the marker is D4S1 75, D4S422, 
D4S1576, D4S2294, D4S1579, D4S397, D4S3089, D4S2965, D4S192, D4S420, D4S1644, 
D4S3334, or combinations thereof. 

Claim 6 (original): The method of claim 1, wherein the chromosomal region is inclusive of 
and localized between markers D4S3007 and D4S419. 
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Claim 7 (original): The method of claim 6, wherein the marker is D4S3007, D4S394, 
D4S2983, D4S2923, D4S615, AFMJ84za9, D4S2928, D4S1065, D4S1582, D4S107, 
D4S3009, D4S2906, D4S2949, AFM087zg5, D4S2944, D4S403, D4S2942, D4S2984, 
D4S1602, D4S1511, D4S2311, D4S3048, or combinations thereof. 

Claim 8 (original): The method of claim 7, wherein the marker is D4S3009, D4S2906, 
D4S2949, AFM087zg5, D4S2944, D4S403, D4S2942, D4S2984, D4S1602, D4S1511, 
D4S2311, or combinations thereof. 

Claim 9 (original): The method of claim 1, wherein the chromosomal region is inclusive of 
and localized between markers D11S133 and D11S29. 

Claim 10 (original): The method of claim 9, wherein the marker is D11S133, D11S147, 
CD3D, D11S285, D11S29, or combinations thereof. 

Claim 1 1 (original): The method of claim 1, wherein the genotype at a single chromosomal 
region is determined with at least three markers. 

Claim 12 (original): The method of claim 1, wherein the marker is for a restriction fragment 
length polymorphism or microsatellite polymorphism. 

Claim 13 (withdrawn) 

Claim 14 (withdrawn) 

Claim 15 (original): The method of claim 1, wherein the marker is amplified. 

Claim 16 (original): The method of claim 15, wherein the marker is amplified by the 
polymerase chain reaction. 

Claim 17 (original): The method of claim 1, wherein the presence or absence of an allele 
associated with increased resistance to bipolar affective disorder is determined. 



-5- 



(09/881,012) 



Claim 18 (original): The method of claim 1, wherein the genotype of an affected family 
member is determined. 

Claim 19 (original): The method of claim 1, wherein the genotype of a non-affected family 
member is determined. 

Claim 20 (original): The method of claim 1, further comprising: 

determining the genotype of at least one family member, wherein the genotype 
is determined with at least one marker for at least one chromosomal region linked to a locus 
associated with susceptibility to bipolar affective disorder, wherein the chromosomal regions 
are inclusive of and localized between D6S344 and D6S89\ inclusive of and localized 
between D13S1 71 and D13S218] or at about D15S148. 

Claim 21 (currently amended): The method of claim 1, further comprising: 

determining the genotype of a tested individual from the affected family, 
wherein the genotype is determined with at least one marker for at least one chromosomal 
region linked to a locus associated with resistance to bipolar affective disorder, wherein the 
chromosomal regions are inclusive of and localized between D4S402 and D4S424; inclusive 
of and localized between D4S431 and D4S404\ or inclusive and localized between D11S133 
and DUS29; 

comparing the genotype of the tested individual to the genotype associated 
with increased or d e cr e a se d resistance to bipolar affective disorder; and 

determining therefrom the incr e as e d or decreased risk of the tested individual 
developing familial bipolar affective disorder. 

Claim 22 (original): The method of claim 21, wherein the genotype of the tested individual is 
compared to the genotype of an affected family member. 

Claim 23 (currently amended): A method for determining the contribution of a chromosomal 
region to the presence or absenc e of resistance to bipolar affective disorder in a family 
affected by bipolar affective disorder, comprising: 

determining the corresponding genotype of at least two family members, 
wherein the genotype is determined with at least one marker for at least one tested 
chromosomal region linked to a locus associated with resistance to bipolar affective disorder, 
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wherein the tested chromosomal regions are inclusive of and localized between D4S402 and 

D4S424; inclusive of and localized between D4S431 and D4S404; or inclusive and localized 

between D11S133 and D11S29; 

determining, after the age of onset, the bipolar affective disorder disease status 

in the family members; 

comparing the genotypes of the family members; and 

determining therefrom the contribution of the chromosomal region to the 

presence or abs e nce of resistance to bipolar affective disorder in the family. 

Claim 24 (currently amended): A method for determining a genotype associated with 
increased susceptibility or decreased resistance to familial bipolar affective disorder in a 
family affected by bipolar affective disorder, comprising: 

d e t e rmining th e genotyp e of at l e ast one family m e mb e r, wher e in th e genotyp e 
is determin e d with at l e ast on e mark e r for at least on e chromosomal region link e d to a locus 
associat e d with r e sistanc e to bipolar aff e ctive disord e r, wh e r e in th e chromo s omal r e gions are 
inclusiv e of and localiz e d betwe e n D4S402 and D4S424; inclusiv e of and localiz e d between 
D1S131 and D1S404; or inclusiv e and localized between D11S133 and D11S29; 

determining the genotype of at least one family member, wherein the genotype 
is determined with at least one marker for at least one chromosomal region linked to a locus 
associated with susceptibility to bipolar affective disorder, wherein the chromosomal regions 
are inclusive of and localized between D6S344 and D6S89; inclusive of and localized 
between D13S1 71 and D13S218; or at about D15S148; 

determining, after the age of onset, the bipolar affective disorder disease status 
in the family member; 

comparing the genotype with the bipolar affective disorder disease status; and 

determining therefrom the genotype associated with increased susceptibility a* 
d e cr e as e d r e sistanc e to bipolar affective disorder. 

Claim 25 (original): The method of claim 24, wherein the marker associated with 
susceptibility is D6S7, D13S1, D15S45, or combinations thereof. 

Claim 26 (currently amended): The method of claim 24, further comprising: 

determining the genotype of a tested individual from the affected family, 
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wherein the genotype is determined with at least one marker for at least one chromosomal 
region linked to a locus associated with resistance to bipolar affective disorder, wherein the 
chromosomal regions are inclusive of and localized between D4S402 and D4S424; inclusive 
of and localized between D4S43I and D4S404\ or inclusive and localized between D11S133 
and D11S29; 

comparing the genotype of the tested individual to the genotype associated 
with increased susceptibility or decreas e d r e sistanc e to bipolar affective disorder; and 

determining therefrom the increased or decreased risk of the tested individual 
developing familial bipolar affective disorder. 

Claim 27 (withdrawn) 
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REMARKS/ARGUMENTS 

Claims 1-27 are pending. Claims 1, 21, 23, 24 and 26 have been amended. Claims 
13, 14 and 27 have been withdrawn. 

The substance of claim 1 has been divided between claims 1 and 24. No new matter 
has been added. 

The documents to which applicant wishes to claim priority are properly indicated in 
the first sentence of the application. 

In the specification, the paragraphs beginning at page 22, line 1 and continuing 
through page 23, line 10, have been amended to specify the SEQ ID NO: corresponding to 
each sequence. This amendment renders moot the objection with regard to "Sequence 
Rules." Support for this amendment is found, e.g., in the substitute sequence listing filed 
June 13,2001. 

Rejections under 35 USC §112 

At the outset, it is noted that the Examiner has apparently misunderstood the claimed 
invention. The enablement and the written description rejections appear to be directed to 
claims that recite a genotype that is associated with increased or decreased resistance to 
bipolar disorder (e.g., the result of a method for determining a genotype associated with 
increased or decreased resistance to familiar bipolar affective disorder (BPAD)). In fact, the 
claims under consideration are directed to methods , e.g., a method for determining a genotype 
associated with increased resistance to (protection from) familiar bipolar affective disorder 
(BPAD) . . . (independent claim 1); a method for determining the contribution of a 
chromosomal region to the presence of resistance to BPAD. . . (independent claim 23); or a 
method for determining a genotype associated with increased susceptibility to familiar bipolar 
affective disorder (BPAD) . . . (independent claim 24). The specification provides both 
enablement and written description for performing such methods. 

The Examiner appears to doubt applicant's assertion in the specification that markers 
identified therein exhibit statistically significant linkage to either susceptibility to BPAD 
(e.g.,D6S7 on chromosome 6; D13S1 on chromosome 13; and D15S45 on chromosome 15) 
or resistance to BPAD (e.g.,D4S2949 on chromosome arm 4p; D4S397 on chromosome arm 
4q; and D11S133 and Dll S29 on chromosome 11). However, the Examiner has presented 
no reasons or evidence to doubt this assertion. 
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In fact, the specification clearly shows, e.g., that D6S7, D13S1 mdD15S45 are each 
linked to susceptibility to BP AD with a SIBPAL p value of 0.0003 (see, e.g., Figures 3, 4 and 
5, respectively). Furthermore, the specification shows that D4S2949 is linked to resistance to 
(protection from) BP AD with a SIBPAL p value of < 5xl0 -5 (see, e.g., Figure 6 and page 44, 
lines 11-13); D4S2949 is linked to resistance to BPAD with a SIBPAL p value of O.0001 
(see, e.g., Figure 7); and D11S133 and Dll S29 are linked to resistance to BPAD with a 
nominal p value of < 5xlO -5 (see, e.g., page 44, lines 15-16). One of skill in the art would 
know that a SIBAL p value (statistical significance of the genetic linkage between markers 
based on sib-pair analysis) of < 0.001 is statistically significant under these circumstances. 
See also the specification at page 34, lines 13-15 ("those markers having a test statistic of p 
<0.001 in any one analysis type . . . showed evidence for linkage"). 

Note that p values are the appropriate statistical criteria for a study of common traits, 
such as BPAD. LOD scores, on the other hand, are an appropriate statistical tool for 
evaluating rare disorders. As the specification states on page 34, lines 27-31, although the lod 
scores in the present study did not reach a LOD=3 criterion, "p values of less that 0.001 (and 
even 0.0001, which is asymptotically equivalent to Z max =3) are found. Together, these results 
lend further support to the significance of these intervals as candidate regions." See also the 
attached articles by Lander and Schork (Lander et al. (1994) Science 265 , 2037-2038) and 
Lander and Kruglyak (Landers et al. (1995) Nature Genetics JJ_, 241-247) (especially page 
244), both of which were published before the filing date of the present application, for 
discussions of the two statistical methods, and when it is appropriate to use each. 

The present specification is replete with sophisticated analyses of the data, employing 
a variety of statistical techniques, which support the significance of the linkages. The 
allegation that earlier studies of linkage in BPAD may have been faulty {e.g., as reported in 
the Berrettini reference cited by the Examiner, and as acknowledged in the specification, for 
example at page 2, line 27 to page 3, line 2) does not cast doubt on the accuracy of the present 
findings. 

In the methods recited in the instant claims, family members are tested for the 
presence of markers that flank the markers discussed above, e.g., that lie about 10 cM on 
either side of one of those markers. See, e.g., the specification at page 14, lines 29-31 and at 
page 15, lines 5-7. For example, Claim 1 recites determining the genotype with at least one 
marker for a region inclusive and localized between D4S402 and DS424 (on chromosome arm 
4q); or inclusive and localized between D4S431 and DS404 (on chromosome arm 4p). As 
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shown in Fig. 7 (for chromosome 4q), marker D4S402 lies -32 cM from the marker noted 
above, D4S397 (-50 cM - -18 cM); and, on the other side of this marker, D4S424 lies - 10 
cM away (-60 cM — 50 cM). As shown in Fig. 6 (for chromosome 4p), marker D4S431 lies 
-15 cM from the marker noted above, D4S2949 (-31 cM - -15 cM); and, on the other side of 
this marker, D4S404 lies - 17 cM away (-48 cM — 31 cM). Shorter spans of markers are 
recited in dependent claims, as are individual markers within these spans. 

One of skill in the art would know that if a particular marker has been shown to be 
linked to, for example, resistance to (protection from) BP AD, markers lying within a span of 
about 10 cM on either side of the marker represent likely candidates for further markers 
exhibiting a linkage to the condition. Moreover, a skilled worker would know that markers 
even more distant (e.g., at least as far as about 30 cM) would still be considered to be linked 
to the marker in question and thus would represent a suitable region for additional candidate 
markers linked to that condition. In the present application, linkage studies indicate that 
several of the suggested regions do contain markers, in addition to those discussed above, that 
are linked to susceptibility or resistance to (protection from) BP AD. For example, Fig. 6 
shows that at least markers D4S2366, D4S1582, D4S419 and D4S404 on chromosome arm 4p 
have a SIBPAL p value of < 0.00 1 . 

In the method of claim 1, for example, a genotype of a (at least one) family member is 
determined as discussed above; the BP AD status of the family member is determined; the 
genotype is compared with the BPAD status; and from this comparison, it is determined if the 
genotype is associated with increased resistance to BPAD. 

The specification is fully enabling for the recited methods: it teaches how to obtain 
samples from family members (e.g., at page 19, lines 12-27 and Example I); how to perform 
the genotypic analysis (e.g., in Example II); how to assess BPAD status (e.g., in Examples I 
and IV) and how to analyze statistically whether the genotype determined is associated with 
increased or decreased resistance for BPAD (e.g., in Example HI). The specification teaches 
a variety of markers that can be used in the claimed methods, and teaches how additional 
markers can be generated (e.g., at page 23, line 27 to page 27, line 10). The Office Action 
states at page 7 that "While one could conduct additional experimentation to determine 
whether markers exist within the recited regions on chromosome 4 and 1 1 and these newly 
discovered markers are associated with BPAD, the outcome of such research cannot be 
predicted." That is to some extent correct. The methods of the invention are directed in part 
to determining whether such markers are, in fact, associated with BPAD. 

41- 

(09/881,012) 



Not only is the specification enabling for the claimed invention, but it also provides 
written description . Contrary to the allegation of the Examiner on page 10 of the Office 
Action, the specification provides even more than a "representative number" of markers 
within each of the claimed regions of the chromosomes. For example, some suitable markers 
to test for resistance to BP AD, flanking D4S2949 (on chromosome arm 4p), are indicated at 
page 14, line 3 1 to page 1 5, line 4; some suitable markers flanking D4S397 (on chromosome 
4q) are indicated at page 15, lines 8-9; and some suitable markers on chromosome 1 1, in the 
-20 cM span between D11S133 and D11S29, are indicated at page 15, line 13. See also 
further candidate markers for resistance indicated on page 1 8, lines 5-13. Examples of 
suitable primers for identifying the above markers are indicated on page 22, line 1 to page 23, 
line 10. The specification also teaches some suitable markers to test for susceptibility for 
BPAD, and methods for identifying those markers. 

As for the allegation in the Office Action on page 7 that "The ability to screen for a 

wellness allele is even more unpredictable because it is very difficult to distinguish between 

the presence of a protective allele and the absence of a susceptibility allele," the specification 

clearly teaches how to distinguish between these two possibilities. For example, the 

specification states at page 39, lines 9-16 that 

Importantly, because of the long-term, longitudinal nature of the study, even the 
unaffected, mentally healthy individuals (those without any psychiatric illness) in 
these families have been closely followed, many for a period of years past the age of 
risk for BPAD. Consequently, rather than limit this genome-wide search to 
identifying susceptibility loci for the disease phenotype (BPAD), we tested the 
hypothesis that "protective" alleles may contribute to the absence of psychiatric 
illness (/.&, mental health 'wellness') in unaffected family members in the 'high 
risk' pedigrees. 

and at page 51, lines 13-16: 

Accordingly, an important step in our study which demonstrates that there are 
"protective" alleles was to show that there are "mentally healthy" individuals who 
share marker alleles that should increase the risk of developing BPAD, and yet, in 
the presence of 'protective' alleles these individuals do not manifest BPAD. 

The anticipation rejection 

The Blackwood reference cited by the Examiner as allegedly anticipating the claimed 
invention reports that certain markers on chromosome 4p, in the vicinity of marker D4S394, 
are associated with susceptibility to BPAD. By contrast, the present inventors have identified 
a region close to that identified by Blackwood, which is linked to resistance to (protection 
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from) BP AD. Applicants believe that the methods recited in the claims are clearly 
distinguished from Blackwood. Nevertheless, in an effort to expedite prosecution, claims that 
recite the use of certain markers on chromosome 4p have been amended to clarify that the 
methods are for determining a genotype associated with increased resistance (rather than 
increased susceptibility) to BP AD (claim 1) or for determining the contribution of a 
chromosomal region to the presence of resistance to BPAD (claim 23). Blackwood does not 
disclose a method for determining a genotype associated with increased resistance to BPAD. 
In order to anticipate a claim, a reference must disclose all material elements of the claim (In 
re Marshall, 198 USPQ 344 (CCPA 1978). That is clearly not the case here. (Claim 24, 
which recites a method for determining a genotype associated with increased susceptibility to 
BEAD does not recite markers in the region studied by Blackwood, and for at least this reason 
is not anticipated by Blackwood.) 

In view of the preceding amendments and arguments, the application is believed to be 
in condition for allowance, which action is respectfully requested. 

Should any additional fee be deemed due, please charge such fee to our Deposit 
Account No. 22-0261, referencing docket number 3 1 67 M 86347 and advise us accordingly. 
However, the Commissioner is not authorized to charge the cost of the issue fee to the 
Deposit Account. 



Respectfully submitted, 



Date: July 22, 2003 




Nancy J. Axelrod, Ph.D. (Patent Agent) 
Registration No. 44,014 
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Genetic studies are under way for many 
complex traits, spurred by the recent 
feasibility of whole genome scans. Clear 
guidelines for the interpretation of linkage 
results are needed to avoid a flood of false 
positive claims. At the same time, an overly 
cautious approach nuns the risk of causing 
true hints of linkage to be missed. We 
address thts problem by proposing specific 
standards designed to maintain rigor while 
also promoting communication. 



Genetic dissection of complex traits is becoming cen- 
tral to mammalian genetic analysis. In the fifteen years 
since it was recognized that genetic inheritance on be 
traced with naturally occurring DNA sequence varia- 
tion 1 , the identification of genes responsible for simple 
mcndeUan trails has become a straightforward, if still 
demanding, task. Over 500 such genes have been 
mapped to specific chromosomal regions in the 
human and more than 60 have been cloned based on 
their position. These breakthroughs arc steadily 
reshaping biological and medical trucking- Yet, many 
of the most important medical conditions — inciud- 
ing heart disease, hypertension, diabetes, asthma, 
schizophrenia, and manic depression — show much 
murkier inheritance patterns. The geneticists 1 chal- 
lenge ts now to tease apart the multifactorial causes of 
these diseases. 

In principle, the solution is dear, Genetic mapping 
of any trait — sample or complex — boils down to 
rinding those chromosomal regions that tend (0 be 
shared among affected relatives and tend to differ 
between affected* and unaffected*. Conceptually, this 
amounts to a three-step recipe: scan the entire genome 
with a dense collection of genetic markers: calculate an 
appropriate linkage statistic S(x) at each position x 
along the genome; and identify the regions in which 
the statistic $ shows a significant deviation from what 
would be expected under independent assortment 

Yci. these deceptively simple instructions conceal a 
thorny question: since the statistic S(x} fluctuates sub* 
stantially just by chance across an entire genome scan, 
what constitutes a 'significant' deviation? What stan* 
oVd should be required for declaring linkage? 

Although biologists often greet statistical issues with 
glazed-eyed indifference, we believe that the resolution 
of this particular question has important consequences 



for the future of our field. To reach our goal geneticists 
must chart a prudent course between ScylU and 
Charybdis. 

Adopting too lax a standard guarantees a burgeon- 
ing literature of false positive linkage claims, each with 
its own gene symbol (ASTHSt, ASTHS7, „)- Scientific 
disciplines erode their credibility when a substantial 
proportion of claims cannot be replicated — even 
more so when the claims reach not only the profes- 
sional journals but also the evening news. Psychiatric 
genetics provides a cautionary talc, in which a spate of 
non-replicable findings in the mid- 1980* undermined 
support for such studies". U is thus essential that 
there be a sufficiently stringent sundard that linkage is 
claimed only when there is a high likelihood that the 
assertion will stand the test of time- 
On the other hand, adopting too high a hurdle for 
reporting results runs the risk that the nascent field 
will be stillborn. Initial generic analyses may fell short 
of the strict threshold for statistical significance, but 
may nonetheless point to important regions deserving 
Intensive investigation. Without channels by which 
investigators can report such tentative hints of linkage, 
the discovery of disease genes may be delayed in an 
overzealous attempt to avoid all error. 

Striking the right balance requires both a mathemat- 
ical understanding of how often positive results will 
occur just by chance and a value judgment about the 
relative costs of false positives and false negatives. Our 
goal hot is to provide an accessible treatment of the 
first subject and to offer a concrete proposal regarding 
the second. 



Statistical significance ingenome-wicte scans 
In searching for disease genes, it is important to distin- 
guish between poinrwlse significance levels and 
genome-wide significance levels. The pointwije (also 
called nominal) significance level is the probability 
that one would encounter such an extreme deviation 
at a specific locus just by chance. The genome-wide sig- 
nificance level is the probability that one would 
encounter a deviation somewhere in a whole genome 
scan. The former concerns a single test of the null 
hypothesis of no linkage; the latter involves fishing 
over a large number of tests to find the most significant 
result. 

Consider the following idealized sib pair study. An 
Investigator collects n pairs of affected sibs, genotypes 
them using a perfect genetic map that is fully informa- 
tive at every point in the genome, and calculates the 
average proportion ft(x) of alleles shared identical-by- 
dcjcenl at each location x in the genome. Ceneticisu 
traditionally report the result* at each location in one 
of three essentially equivalent ways, a Z-scorc. a lod 
score, or a P value. The 2-scorc is the number of wan- 
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dard deviations by which *{x) exceeds its null expecta- 
tion of 0,50; it follows & normal distribution when ft is 
Urge. The lod score (or MLS — Maximum Lod Score* 
the lod score maximized over a set of parameters) is 
the log-likelihood ratio of the data under the hypothe- 
sis thai the allele sharing proportion has the observed 
value n(x) as compared to the hypothesis that there is 
no excess sharing; the distribution of this autistic Is 
related to a chi squared distribution when it is large, 
The P value reflects the pointwise chance of observing 
a deviation as high as rt(x) under independent assort* 
rnent_ 

Suppose, for example, that a srudy of 100 sib pairs 
reveals an allele sharing proportion of 6 1% somewhere 
in the genome. This result corresponds to a Z-score of 
3.1, a lod score of 2.1, and a nominal P value of OJQQl. 
Should one be impressed by this finding? It dearly 
depends on how often such deviations would arise by 
chance in a whole genome search. 
. The mathematical theory of large deviations holds 
the answer, as was pointed out a few years agcM The 
expected number of chromosomal regions in which a 
linkage statistic exceeds a threshold T is given by a sim- 
ple formula u(T), explained in Box 1. In feet, the num- 
ber of such regions approximately follows a Poisson 
distribution with mean p^T), and the chance of find- 
ing ac least one such region is thus l-e"* 1 * 7 ' = \L{T) 
when w(T) is small. The approximation becomes 
asymptotically cxaci when the number of sib pairs is 
large and the threshold T is high. In fact, it is accurate 
enough for practical purposes provided that n is at 
least 50, (It is worth noting that, while lod score analy- 
sts of a small number of targe families can be quite sen- 
si live 10 changes in a few key data points, 
non-parametric sumtics based on * large number of 
small families tend quickly \o normal distributions and 
tend to be robust.) 



Fig. ! shows the results in graphical form. Focusing 
on P values, we expect regions significant at P- 0.05 to 
occur about two dozen times by chance (that is, at least 
once on most chromosomes); P=0.01 about 7-6 
times: P« 0.001 slightly more than once; P= 0.0001 
about 0.2 times; and P= 0.00002 about 0.05 times. In 
other words, there is a 5% chance of randomly finding 
a region with a P value as extreme as 2 x Ifr 5 . To keep 
the chance of encountering a false positive at no more 
than 5%, one must therefore impose a threshold of Z Z 
4.1, lod S: 3.6 or P=» 2 x 1(T 5 - With any less stringent 
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Fig. i Numoef ol false positives expected in a whole genome 
scan lor a own threshold of too score, Z score Of poimwise P 
value. SoOd tine represent* asymptotic expectation lor a perfect 
genetic map. based on the theory described In die 8qk V Sym- 
bols represent resuKs tor 1 00 sib pairs Obtained from lOO.OOO 
simulations using generic maps wttn martcaoa spaced every 0. 1 
CM (circles), every \ CM (squares), and every 10 CM (triangles]. 
Tn© genome la assumed io consist ot 23 chromosomes, witn 
lota) length 3450 CM. Nolo the Close correspondence b*fw«*n 
the asymptotic tneory and ln« 0.1 cM simufaUon. Tne dotted 
ktrjieeUa tho 5% genome- wtde significance level. 
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threshold, there is a substantial chance (>5%) of 
reporting raise linkages. To illustrate die point, we 
describe a simulated whole-genome scan in Box 2. 

The standard may seem harsh at tint glance, but it 
accords well with historical practice. Hie traditional 
threshold of lod 3 for classical two-point linkage stud- 
ies of simple mendelian trails corresponds to an 
asymptotic pointwise significance level of P * NT* (re£ 
10), cor a genome-wide significance level of about 9%. 
In feet, the iod score threshold need only to be raised 
to 3.3, corresponding to P= 5x l0* s , to achieve the 
retonimeftded genome-wide signif- 
icance level of 5%. 

It is worth twang that there is a 
widespread misconception in 
human genetics that a lod score of 3 
is equivalent to a significance level 
of only Pa \Qr\ This error is root- 
ed in a confusion about the mean- 
ing of lod scores and P values. Lod 
scores concern the ratio of two 
probabilities, while f values refer to 
a single absolute probability. Specif- 
ically, lod- 3 means that the 
observed data is lOMold more like- 
ly to arise under a specified hypoth- 
esis of linkage than under the null 
hypothesis of independent assort* 
me nt By contrast* J>« l(T 5 means 
that the probability of encountering 
as large a lod score as observed is 
10" 3 under the null hypothesis. One 
can convert a lod score to a chi- 
squared statistic by multiplying by 
2(10^10}** 4.6 and rhen use stan- 



dard statistical tables, taking into account the one- 
sided nature of the test* to confirm that a lod score of 
2,1 corresponds to P= 10*\ white a lod score of 3.0 
corresponds to the more extreme P = ICT*. 

Are whole-genome thresholds overly stringent? 
Some geneticists might object to imposing such a 
stringent standard for declaring linkage. Certain argu- 
ments have been advanced in the hopes of gaining spe- 
cial dispensation. It is worth considering them in turn. 

• "My study only looked at a few rrutrken (or a few 
chromosomal regions), so it's not fair to impost a thresh- 
old bostd on a wholt genome search." The extreme 
example of this argument would be a geneticist who 
finds a weakly positive score with the first marker and 
socks to employ the pointwise significance level — 
asserting that only a single hypothesis has actually 
been tested The fallacy is that the investigator would 
not have abandoned the search if the first marker had 
been negative, but would have persevered until a posi- 
tive result was obtained or the entire genome was 
examined Having assembled a large patient collection, 
the geneticist is committed to a whole genome search. 
It makes no sense to employ a different threshold 
depending on whether the inevitable false positive 
fluctuations happen to occur earlier rather than later 
in the search. 

• "My study only involved a genome tcatt with markers 
every 10 cM so it's not fair to impose a threshold based 
on an infinitely dense genetic map." Again, the analysis 
does not stop with the sparse map. Initial hints of link- 
age with a single marker are immediately pursued by 
using multipoint methods and by peppering the region 
with a dense collection of markers. In any region that 
matters, geneticists rapidly extract the complete inner* 
icance information — with the explicit hope of 
increasing the linkage score. 

A hierarchical search — in which one performs a 
genome scan with a sparse map and then follows up 
'interesting' regions with a denser map — is an effi- 
cient study deslgn llv l3 , but the resulting false posi- 
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fig. 2 Simulated genornti scan with no trail loci segregating Cvomosomal silt is 
proportional lo genetic length, taken from f«f . 33. Multipoint lod scores wefts com- 
puted as OoscribdO 34 . 
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tivc rate is essentially the same a* if a dense map had mote both rigor and communication. Ud wore* of 3 

been used throughout the. genome (D. Siegmund, were required to dedare linkage in officii! chromo- 

personal communication). This is because the false some committee reports, but weaker evidence could 

positives are almost invariably included among the still be shared in more inf rmal vehicles such as the 

regions chosen for follow-up. Mcftirick Ncwsktttr, a predecessor to the modern 

The dense-map threshold turn* out not 10 be that Mtnddian Inheritance in Mart**, 



draconian. If one performed only single point analy- 
sis with in evenly spaced map, the thresholds for a 
genome-wide significance level of S% would not be 
dramatically different: the lod score thresholds would 
decrease by -20% for a 10 cM map; -15% for a 5 cM 
map; -10% for a 2 cM map; and -7% for a 1 cM map 
(Fig. 1 shows first and last cases). Mortem, these 



thresholds would be appropriate only if one did not use scan. 



Dear thinking about complex traits would be served 
by reviving such An approach. Specifically, we propose 
the following classification based on the number of 
times that one would expect to see a result at random 
in a dense, complete genome scan: 

* Suggestive linkage — statistical evidence that would 
be expected to occur one time at random in a genome 



multipoint analysis or a denser map to obtain more 
information. To our thinking, it is better to extract the 
full inhencance information and find the best P value. 

In the modern world, it is fair to assume that highly 
motivated investigators squeeze as much information 
as possible from the available family material and their 
results should thus be measured against the corre- 
sponding threshold for a dense genome scan. (Some 
backsliding might be countenanced if strong prior evi 



Significant linkage — statistical evidence expected 
(0 occur 0.05 times in a genome scan (that is, with 
probability 5%). 

• Highly signifiamt linkage— statistical evidence 
expected to occur 0.001 limes in a genome scan. 

■ Confirmed linkage — significant linkage from one 
or a combination of initial studies that has Subsequent- 
ly been confirmed in a further sample, preferably by an 
independent group of investigators. For confirmation. 



dence exists to restrict the search to a region; possible a nominal P value of 0.01 should be required (see 

cases indude a true single-point test of a highly relevant below). 

candidate gene, a test of the HLA region for an autoim- In the case of sib pair studies, the first three cate- 

mune disease, and an X-chromosotne scan for a trait gories would correspond to pointwise significance fcv- 

wim convincing prior evidence of sex linkage) els of 7 x HH, 2 x 10* and 3 x 10" 7 and lod scores of 

Notwithstanding our desire to avoid spurious link- 2-2, 3.6, and 5.4. The corresponding P values for other 

ages, we must always remember that regions that fall study designs differ somewhat (Box 1 , Table 1). 

short of statistical significance may nonetheless be cor- Suggestive linkage results will often be wrong, but 

reel. Unfortunately, there is no way to distinguish they arc worth reporting— if accompanied by an 

between small peaks that represent weak true positives appropriate warning label about their tenuous nature, 

and peaks of the same height arising from random flue- Investigators concerned about coming up empty* 



ruauons, assuming that all inheritance information has 
been extracted. It would be irresponsible to consign 
such potentially valuable hints to the dustbin of labo- 
ratory history. What then is to be done? 

Proposed standards 

Back in the days when linkage studies of even the sim- 
plest trait required heroic eSbrts and good fortune, the 
human genetics community adopted standard* to pro- 



Mapping method crossover 

ratep 

tod scorn mtysb In human 1 

AJteJe-sharing methods in human 

stbsandhatf-sifes 2 

Ofanc^arwt-orandc^no *\ 

uncte-nephew 5/2 

fast cousin a/3 

first cousin, once removed 20/7 

second cousin t6V5 

QTL mapping in mouse or rat 

Backcmss (i d.f.) 1 

imwan&M (1 a.t. adaptive) 1 

intercross (1 al t recessive) 4/3 

Intercross {1 d.f., dominant) 4/3 

hrtcmmss (2 d.f.) 1.5 



suggest** linkage 
Pv4ua(lod) 

1.7x10^(1j9) 

7.4 x10-« (2.2) 
1.7 x10-» 11.9) 
5.6 v 1(T* (2.3) 
S,2x1tr*(2.3) 
44x10* (2.4) 
4.2x10^(2.4) 

3-4x10^(1,9) 
3.4x10-3(1.9) 
2-4x10* (2,0) 
2,4x10* (2.0) 
1.8x10^(2.0) 



Sft pair anatyws involves no dominance component, and thus each sib pair ;* equfcnriw* vo hart-sib 
P^s-tffl (wore thresholds lor the possible Wangle method tot sib p*n* i7 may be computed by simitar 
nwvioo* fp. Sjeomund. persona convnunteation); these (hreahokb era 2.6 lor eugs&slivc a*d 4.0 lor signifl- 
c^U(r»Jcage. Gsnomo *H* re assumed to be 3300 CM for it* human and 1600 cM (or the mwso end in* 
rai. A typography error apposed In the table of F*f, M. which fated the significant P value for ha(r.*ib and 
sib pairs a* 3 x 10* The corroct P value is 2.2 x 10* as shown above. 



handed in a genome scan can take comfort from the 
facr that they can expect, by definition, to find about 
one suggestive linkage for every trait studied. On the 
other hand, journal editors must weigh how much 
attention to accord such results. Ac the least, specialty 
journals should actively support the reporting of sug- 
gestive linkages in some format Indeed, it is worth 
reporting all region* with a nominal P value of P- 
0.05 encountered in a complete genome scan* but with- 
out any claims of linkage. 

Because suggestive linkages are so 
speculative* they should not be 
assigned gene names lest medical 
genetics be overrun with illusory 
lod Geneticists should enter into a 
non-proliferation pact under 
which genes symbols are reserved 
for significant linkages. Indeed, tra- 
ditional usage has been 10 assign 
gene names only to confirmed link- 
ages. The appropriate nomencla- 
ture committees should take up this 
issue and develop specific guide- 
lines. 

It is worth pointing out that even 
significant linkages will turn out to 
be false positives 5% of the time, 
that is, once in 20 genome scans. 
Because of the bias that only posi- 
tive results tend 10 get reported, the 
observed false positive rate will be 



f^nfficarufinkage 
P value Ood) 

49x10-* (3.3) 

2-2x10-4(3.$) 
4.9 x t0" s (3*3) 
1,8 x tO* £3.7) 
1.6 x 10~*(3.7) 
1.5x10-5(3.8) 
1.3 x10* 5 (3-B) 

1.0x10-* [3.3) 
1.0 x 10^(3.3) 
7.2 x 1(H (3.4) 
7.2* 10-* (3.4) 
5.2* 10"* (3,3) 
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higher in the published literature. While individual 
investigator* cannot do anything about this problem, it 
offer* an additional rationale for conservative sun* 
dards. 

Thomson 14 recently proposed criteria for putative 
linkage that turn our to be essentially equivalent to our 
standard for suggestive linkage. Unfortunately, these 
criteria have been widely misinterpreted as implying 
genome-wide significance, despite Thomson'* dear 
statement to the contrary. In fact, Thomson (pers. 
comm. ) endorses the standards proposed above. 

Replications end extensions 
Linkage results must be replicated to be credible. We 
suggest thai the term "replication study" should be 
reserved for situations in which signtficani linkage has 
already been obtained in an initial study (or combina- 
tion of studies). Weaker findings do not merit the same 
standing as prior hypotheses. For example, there will 
be many regions with a nominal F value of 0,05 and 
some will appear to be 'replicated' in a second study 
^ist by chance (Box 2). We prefer the term "extension 
stud/* for the process of testing of additional families 
in the hope of first reaching the genome-wide signifi- 
cance leveL Once significant linkage is found, it is 
appropriate to speak of 'replicating' the result 

Because replication involves testing an established 
prior hypothesis, the multiple testing problem assodat* 
ed with genome-wide search does not apply. Nonethe- 
less, some caution is sail required. The initial 
localization for a linkage is typically spread over a broad 
region of about 20 cM. Because one is searching over an 
interval, there is a multiple testing problem writ small: 
the chance of finding a P value of 0.05 somewhere within 
a 20 cM interval is greater than 5%. It turns out that a 
pointwbe ? value of* 0.0 L is needed for an interval- 
wide significance level of 5%. Accordingly, P«0XU 
should be required to declare confirmation at the 5% 
levd Note that this correction is equivalent to a multi- 
ple testing (or Bonferconi) correction for 5 markers. 

failure to replicate does not necessarily disprove a 
hypothesis. Linkages wUl often invoKe weak e&Ctt, 
which may cum out to be weaker in a second study; 
Indeed, there is a subtle bur systematic reason for this: 
positive linkage results are somewhat biased because 
they include those weak effects mat random fluctua- 
tions helped push above threshold* but exclude slightly 
stronger effects that random fluctuations happened to 
push below threshold Initial positive reports will thus 
tend to overestimate effects, while subsequent studies 
will regress to the true value (see also ref. IS). Replica- 
tion studies should always state their power to detect 
the proposed effect with the given sample size. Nega- 
tive results are meaningful only if the power is high. 
Regrettably, many reports neglect tout Issue entirely. 

When several replication studies are carried out, the 
results may conflict — with some studies replicating 
the original findings and others failing to do so. This 
may reflect population heterogeneity, diagnostic dif- 
ferences, or simply statistical .fluctuation. Careful 
meta-analysis of all studies may be useful to assess 
whether the overall evidence for linkage is convincing. 

Suggestive linkages should be pursued in extension 
studies, in which old and new datasets are combined 
to see whether a significance evidence of linkage can 



be found. To combine results among studies, it is t 
always best to pool the raw data and rc-anaryzc the 
entire dalaset. tod scores can be added across studies, 
but only when they arc computed by the same 
method, with the same set of markers, and at the 
same map position. Other meta-analysis techniques 
exist 14 . 

Statistical aficionados may recognize that extension 
studies involve a subtle mulupk-tcsting problem of 
their own, because a significant result in any of the 
individual datasets or the combined dataset ts often 
taken as evidence oflinksge. A modest multiple-testing 
correction to the genome-wide significance level 
should therefore be used in extension studies; the 
appropriate correction depends on study design. Of 
course, any combined analysis should Include alt stud- 
ies — both positive and negative — to avoid biasing 
the results. If the combined analysis yields significant 
linkage, it U then appropriate to undertake a replica- 
tion study. 

Pursuing hunches 

Formal procedures are useful for standardizing the 
general acceptance of linkage claims. Still, gene hunters 
should not be inhibited from pursuing all hints and 
hunches, including: following up all regions with 
pointwtse P values of 0.03 (even though many will 
prove to be illusory); being encouraged if they find 
substantially more suggestive linkages than the one 
expected by chance (even though real loci cannot be 
distinguished from false positives); and using epidemi- 
ological arguments to infer the existence of loci with "n 
small effects (even though such inferences are highly 
model-dependent). It wOl, however, be worth having 
rigorous evidence in hand before undertaking posi- 
tional cloning to avoid the unpleasant prospect of 
chasing a phantom locus. 

Hints of linkage are usually followed by testing for 
linkage in larger datasecs. Some true susceptibility loci, 
however, may never show significant linkage because 
they confer a very small increased risk and have com- 
mon alleles. The ptoof,th*t such loci are involved in 
disease aetiology must come from other data. linkage 
disequilibrium can offer a powerful complement to 
traditional linkage studies, for loo having small effects 
but relatively few alleles in a population, tests of link- 
age disequilibrium can be much more sensitive than 
tests of linkage. A good example is IDDM2, tbe insulin 
gene, for which strong evidence of linkage disequilibri- 
um is obtained in many datasets that rail to show link- 
age 1 ™. 

linkage disequilibrium can be used in an explorato- 
ry fashion to pursue suggestive (or weaker) linkages 
(for example* ref 19). Appropriate correction for mul- 
tiple testing is essential in such applications — because 
multiple regions and many different haplotypcs are 
tested for disease association. This topic has not 
received adequate attention and is an important area 
for future statistical research. Finally, wc note that link- 
age disequilibrium studies should use family-based 
controls whenever possible to avoid false positive find- 
ings due to population stratification* 0 " 2 *. 

Other models and difficulties 

The basic principles above apply to any analysis of 
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complex traiu — whether by linkage tnalysit, allele- 
sharing methods, or quantitative trait mapping in 
experimental crosses (Box I). The poinrwise P values 
vary somewhat according 10 the method, but they are 
typically in the r?ngc of 10"^— 1 0"" 4 for suggestive and 
l(H-10r 5 for significant linkage. 

Mettlesome problems remain, however. Investigators 
often cry out multiple diagnostic schemes for defining 
affectation status, as well as multiple models of inheri- 
tance for linkage analysis. Similarly, studies of quanti- 
tative traits may examine a large number of 
phenotypes. Qatasets arc frequently stratified using 
additional criteria, for exampk HLA genotype. What 
statistical price should be exacted for such fishing over 
multiple models? If the models are statistically inde- 
pendent, the observed P values should be multiplied 
by the number of models (which is known as the Bon* 
ferroni correction). This prescription is too conserva- 
tive in the case of closely related models (such as 
correlated phenotypes), but there is no general guid- 
ance for how to proceed other than simulation. Even 
simulation poses a challenge, in chat millions of simu- 
lations are needed lo accurately estimate P values in 
the range of 10" 5 . Techniques such as importance sam- 
pling can make simulations much mote feasible 23 , and 
they should be performed whenever possible. 

An additional difficulty is that false positive rates can 
be much higher than estimated if model parameters 
(such as gene frequency) arc misspedfied, if sample 
size is small, and if other assumptions of statistical 
independence arc violated. A careful consideration of 
all these factors is beyond the scope of this commen- 
tary, but they offer an additional reason for caution in 
interpreting linkage results. 

Examples of complex trait analyses 
ID DM. Recent genome scans for insulin dependent 
diabetes mcllitus (IDDM) illustrate die issues welL 
Davies and colleagues* used markers at an average 
spacing of 10 cM to survey the genome in 96 sib pairs, 
and then followed up some regions with lod £ 1 (P = 
0.05) in two further collections with 102 and S4 sib 
pair*. Sib pairs were analysed together, and also divid- 
ed according to HLA sharing. In the initial screen, only 
HLA met the standard for significant linkage, with lod 
= 8, Two further regions, on chromosomes 8q and X. 
showed suggestive linkage. A total of 20 regions had 
lod £ 1, which is not significantly greater than would 
be expected by chance. 

Two regions that fell somewhat short of the criterion 
for suggestive linkage were chosen for followup. A. 
region on chromosome llq (named IDDM 4 near 
FQF) had a F value of 0.01 in the combined dataset, 
but showed suggestive linkage in sib pairs sharing 1 or 
0 alleles at HLA. In fact, an independent sludy found a 
nearly significant linkage in this region, but only in the 
subset of sibs in which both carried HLA-DR3 23 . 
Although the cwo studies were not jointly analyzed, it 
is likely the combined data would reveal significant 
linkage — indicating that this locus is probably real. 
The second region on chromosome 6<j (named 
1DDM5) fell short of suggestive linkage in the com- 
bined dataset; it remains undcar whether there is in 
fact a susceptibility locus in this region. Interestingly 



the same group subsequendy reported that a region on L 
chromosome 2 that fell far short of suggestive linkage 
{p s o.Ol) showed evidence of linkage disequilibrium 
in some populations 14 * If widely confirmed (sec, for 
example, ref. 26), this would underscore the value of • 
linkage disequilibrium studies for identification of 
weak susceptibility loci. 

A major contribution of these studies is that they 
demonstrate that there are no other loci with major 
effects comparable to HLA. The authors recognized 
this fact, but provided a valuable spur to further inves- 
tigation by identifying the mosi promising regions for 
further study. 

The IDDM story rernains a work in progress 2 *'**. It 
will probably require joint analyses of multiple dataset* 
10 sort out which of the hints of linkage are real. In 
general it would be valuable if data from published 
genome scans wete routinely deposited in an accessible 
form to facilitate such work. 

Schizophrenia. Evidence for a susceptibility locus on 
chromosome 6p in a Urge collection of pedigrees from 
Ireland was reported by Wang tt alP in a recent issue 
and Straub er a£ J0 in the current issue of diis journal 
The lod scores in these papers came extremely close to 
the standard for signftcant linkage (corresponding to 
genome-wide significance levels in the range of 
0.05-0,10). The authors carefully stress the need for 
reputation. 

Happily, two independent datasets reported in this 
issue by Moises er qL* 1 and Schwab er ai n appear to 
provide such evidence— each showing suggestive 
(and nearly significant) linkages in roughly the same 
region. Although * joint analysis has not yet been 
undertaken, it is dear that chromosome 6p meets the 
standard for significant— and, probably; con- 
firmed — linkage. 

Conclusion 

The study of complex traits promises to be among the 
most important and challenging areas of mammalian 
genetics. As with any endeavor, the field w^l be shaped 
by the standards adopted by its practitioners. The tra- 
ditional threshold of lod 3 has provided a rigorous 
standard that must be met to declare linkage for a sim- 
ple MendeUan trait: it corresponds to a genome-wide 
false positive rare in the neighborhood of S%. The pro- 
posed standard for significant linkage simply extends 
this same logic to the situation of complex traits; the 
required P values accord well with the practice in 
human genetics over the past three decades. At the 
same time, the category of suggestive linkage should 
facilitate reporting of tantalizing but uaproven find- 
ings. By adopting clear rules for communicauon, 
human geneticists will be well prepared for ihc 
avalanche of information about to descend. 

Acknowledgments 

Ki thank M. Bcehnk*. A.ChakrnvnrtK & Bitten, W. £wcns, 5. 
Ghosh. I M<uCfWr. M. Mahtani, M, McCarthy, /. Nofon, J. Q<t. 
N, Schork. 0. Sitgmwtd, R. Spi*<m4n. /. TcrwiUiger, G. Ttiomson, 
/. 7"«frf, a» AD. Weeks for helpful coiTtmcrtt? oh tke tttanuxripi. 
This wvr* was iuppon&t in part by grunts from tht Sauonal 
Getuerfof Human Genome Rtttarch (to f_5.L and L.K.). 



246 



fWure genetics volume 1 I nov«moc> 1 M5 



7/17/03_ THU ,14:15 FAX 50 8_ J35 6_ J26 2 7 _ BNRI ^ 

^ progress 



i. tout**. 0., While, O.L. Steoew*. M. & Ox*, RML Conjuuaion o* » 
gfoeifc Wapftmao In man cair^testricfan (ranmeMfenflV' 
pa^nefSM^-Vn Jt fltm, Gpef. 32. 3 > 4-331 |1flft0>, 

Z BjuWV M. *< *. O«oc«c fcctwrsjo X-ctvoorw«vnc manWs and 

tKXfer atf90»*c bnaaa . N»iuns*».ZB*.2ft2(l9fi7V 

3. EoeUnd, J>*» CI «/. Bvpola/ arf©0/v% disorder* fcTkcd to OUA masters or 

4. Kaleo*, J.R. ef tf. ftft-enTuadon of lha tinkaQft tdtfortshjp bON»cen 

OkJ Odar AmtsK Atatuns 342, 236-243 (l«69t. 

5. Shar/v»gwn. tl a/. LncatiiaOon of a suocaci^Gty «>gwc tor 
*crtu*(Wnia on crvomooome & AMunj *3A, 16«-lC7 (1006). 

on chronwome s m a nonnafn S-«diah p«*9i*e. Atefen 336. 1C7-1 70 
'1986). 

r B$rot\. M, « at. OMnbhM soppd* <or (Moioa bniwedrt mantc 

depmssMQ ftnass and X<tvornoaome martiefj In dw&e Israeli poditfra«, 

Nstuna Genet 49-5$ (1393). 
a. Lmdor. G.S. A Soortain. 0, Mapping mendeton factor* underlying 

QuanUudva iraits us*no, tvxjr tnkayo map*. Genets izi. lev 99 

PW$>, 

9, fonooto. &. OwrryP.Q. L Slapmund; O. G«wssb^ models (or flcnet* 
cnttag* anatyels urine; oompteta Ngrwesotulion map* ol identity by 
dooeeni Am. JL injm. Genet 53. 23«-251 (1 »3). 

10. Cno&u. J. On (h9 lod *eo«3 method Jn iWtage anatyjta. Aa* hum. Gwt 
43.359-378(1984). 

1 1. Gttoo. R.C Oestgn tar me global aaarcn or me human gencr* by 
fr*aoa analysis, m Prpcaeafrvj c/ the ffilh vKwnsttana; Biometrics 
OOflfcrvwe 39-31 AXanOton. hew Zealand. 1892). 

12. eu-t. at.. GenX M.8. 1 VMMka. 0£. EffWorU stralagtae (or eeaomlc 
aearcnlng using <he J^ec^^>aigrto-men*w method of finkage 
•n»^H». Am. J. Am Gsn#t 54. Si4-6$2 (1 994J. 

13. MeKwfcfc. VA & Edward*. JK Unassipried xyolanlc groups and 
IMorcticA] cor^OcnOiona. Second Intamalioral wgrlahop on humwt 
9«>fl rnappog. ^<orj*ict. CmQangt. 1< tW~I«(l9/5i 

14. ThomesoA, 6. idenUfylng oomptoc dt««aM g«noc pA0re» and 
poiacfiomfl. Mtfurv OsmL 8. 106-1 10 <iflM). 

13l fluanx BX. Hampa. CJ_ A Vm Caft d owa*g\ P. Pwotonsof waptcaiina 
Bnk*^ claima ir. payoiilevy. In Oanttfc v^roae^ (om^fs/d&artfsri 
£8. A Oortnoar, CJi, 23-<6 ^narloan P*yctU»<ric 
AasaetaUon, VteNagun.0a 19M). 

18. Cox. OiR. It^racJey. a\t TtocwKfca/SiafiKtes tpapman ^ msd, 1874). 
U. SpWmam R&. MeQInnv. RE. 4 Ewgra, Wj. Tir^nsrrusaKxt last tw 

Rnftage tfooQuBWom: ma Inaufio oana region and Ir^r^clepenoafti 

10. Benrwi. S.T. fl t af..8v9capclbBlty to human typo 1 dVttaus ^/DOM2 1$ 
d6i*mw«ed by ta^dwn i*p««i variation at 1l»c ixiiAn ^ano fUnbareffto 
MaAm Gbaat. 0. 2$4~d2 (1965). 

19. Oeperwi. ^ 0, af at. Untepa cVaagi rffrhkm mappfaig at % typo 1 dtabam 



au»etObb*^p^fO0M7mcfiro^^ ». 
U-B$(1»S. 

2a Ewtm WJ. & SptdVnarv. rXS. 1M irart$rMS»oMdisaQu1(Mum uau 
htttory. subdM$!on. and admufturc. Am. J. Iwm. Oanat 57, 

maohus In me UK: aasac»tk» or tounov elfael? «wm. mo/. Genet «. 
160^1612(1939. 

22. Tnomaort.a Mope**; disease oene*: (vnXy-Uascd asMCation tixxi^ 
Am, J. num. GcneL 57. 4*7-498(1 WSJ. 

23. TarwKoar. J J). 1 OH. J. A mwlU-tan^ila b0Ot$uap approach to ma 
estimation of maxlmbed-c^^nrjdets led score ofe&l&utton?. 

24. tevieSt J.U «i af, A O^orne-wkJ* voatcft (or human typo 1 diabetes 
suseap4Jbartyo^nr^M«w«47l, 100-136 p^WV 

25. r^h«KO.Lwaf.GenaClcrriapp^ofa$uscaO(l^ 

<> pandcnt diabeice maaius on avomosoma \ i q. *Mlunj 371.16\-I&a 
(1994). 

25. Luo. 0,. Madafan, MIC. Huing. ML MW. A. & $nc, J. and 
Cftflaowvot BssOdBl>on analyses «f 028122 in tosubMlapviderii 
o^l«moU^.Atrtoirnwirty (1699, Jnoran), 

27. Todd, Genetic ana*/sb of type 1 dtabaves uling vrtwic paoomo 
approacnav /'roc nam. .Acad. Sci U&A ft2, 8560-eSSS (19»5). 

26. Uw. D. aial. AXedad-t i b pair faaoping trf ■ novd aiftCapt?b3Kty fjene lo 
in»ulinwiap«ndaAf datoetM meHtua on crvwnowmc 
6ojZS^7. Am. Jl oum Oanet ft7.»n-6ld(l095t. 

2B. ¥ra^,S.«ta/.EvUancaCorAfiir«ocp4bl^ 

chrt>(Tx»orfvBpiBrs>22.ntt[ur9GeneL 10, 41^(1995>. 

30. Straub, RX. ef *». A potent vumsrabiHy locus for schtupnrcnia on 
chnirnoaonw 6p2*-Z2; ovidanoe for c^netic hvtacoQcneiiy. mkuto 
Oanat 1 V 267-494 pSftS*. 

31. Moacs. H.W. at ai An warnationaf (wr>ctaQa oanomg wfcfa acarch to 
achifophranla auaeopdblfn/ ganaa. MKcrt rjenor. f 1. 327-324 (1903). 

32. Srjiwabifeawai equation of »iu*cw 

on chnMttiome 6p by rngftpoktt cftacted sk>ca> fr«ap6 anaiyB<£. 
MsJureGanH 11. 325-327 (\99SU 

33. dytoay. Cat a/. The 1993-04 Ganatfton human gaoaik: linkage map. 
AtaMateof. 7. 246-33d(1994). 

3d. K/UQtyafc. U & lander. C3. Cornpteta n^iipoiM aio pair anaiveto of 
QuaUiainM and quarttfialKv tniti Am. JL num. Ganat S7. 439-^54 
fiaasi 

35. KrugftroK. L. landar. A na n e^ram al ria approach for mapping 
quarttafcwtrttirxLGarnaei 138. 1 421-1420 (19951 

36. Mgoll £. Marfcav prooasses tot moOeSrg and onalytBAg a new 
ger^rrwpDir^melnod.Jla^.ftr^SO. 766~77«0»3). 

37. Hotaans, P. AayrnptoSc prooarticg of Aflaacd*«lb-pelr Unioge arwysla. 
An. / num. Genet 5Z. 362-374 (1BTO). 

38. Landac6.$ ASenort*,NJ.Gerwtotf 
2V$, 20O7-2O4B (1 9&4). 



nd(irfe <jenel*cs volume 1 \ rxj^moer i9S5 



07/17/03 THU 14:45 FAX_ 50 8^ A 5 6 , A 6 - « - 



BNRI 

vLii' i^l>au jvi on main 



I ARTICLES 



Genetic Dissection of Complex Traits 

Eric S. Lahder* and Nicholas J. Schork 



S009 
1016 



Medical genetics was revolutionized during the 1980s by the application of genetic 
mapping to locate the genes responsible for simple Mendelian diseases. Most diseases 
and traits, However, do not follow simple inheritance patterns. Geneticists have thus 
began taking up the even greater challenge of the genetic dissection of complex traits. 
Four major approaches have been developed: linkage analysis, allele-sharing methods, 
association studies, and polygenic analysis of experimental crosses. This article synthe- 
sizes the current state of the genetic dissection of complex traits — describing the meth- 
ods, limitations, and recent applications to biological problems. 



Human genetics has sparked a revolution 
in medical science on the basis of rhe 
seemingly improbable notion that one 
can systematically discover the genes 
causing inherited diseases with Que any 
prior biological clue as to how they func- 
tion. The method of genetic mapping, by 
which one compares the inheritance pat- 
tern of a craic with the inheritance par- 
terns of chromosomal regions, allows one 
to find where a gene is without knowing 
what it is. The approach is completely 
generic, being equally applicable to spon- 
giform brain degeneration as to inflamma- 
tory bowel disease. 

To geneticists, this revolution is really 
nothing new. Genetic mapping of trait-caus- 
ing » chremosoroal locations dales 
back to the work ofScurtcvanr in 1913 (1). k 
has been a mainstay of experimental geneo- 
cists who study mite flies, nematode worms, 
yeast, and maize and who developed generic 
maps containing hundreds of genetic markets 
chat made it possible oo fellow the inheritance 
of any chromoeomal region in a controlled 
cross- With (he advent of recombinant DMA. 
generic mapping was carried to its logical 
conclusion with rhe development of posickm* 
aUloning— chc isolation of a gene soWy on 
th£bask of its chromosomal location, without 
regard to its biochemical nmcdon. Positional 
cloning was invented by Bender and col- 
leagues, who used it to isolate the bkhorax 
complex in Ehvsophita (2), and it rapidly be- 
came a routine technique in flics and womt$. 

Despite its central role in experimental 
orgmitttv. genetic mapping hardly figured in 
the study of humans throughout most of the 
century. There were two reasons: the lack of 
an abundant supply of genetic markets with 
which to study inheritance, and the inability 

E. Sl L*rx$<ir fcs with the vvrvtenoad k&hiAB i& giorTOc*. 
cat Ftosea/en. Cambridge. MA 02142, USA and cr* Oe 
parvnem of Bttogy. Massachusetts institute of Technol- 
ogy. C&m^bgo. MA 02138, USA. N. J. Scfxxfc is with 
the DGpetnvsrtt of Genres MS Cmter for Human Ge- 
^Cfkre. Case Wo8(em Reserve University SCxx* d Med- 
iate *n IWwwlty Hospitals of Oe*e»and denary) 
OH 44i ge, USA. 

•To whom correspondence shoufc be addressee. 



to anangc human crosses go suit experimental 
purposes. The key breakthrough was the rec- 
ognition that naturally occurring ON A se- 
quence variation provided a virtually unlim- 
ited supply of genetic markers— an idea first 
conceived of by Bocstein and colleagues for 
yeast crosses (3) and subsequently for human 
families (4). With highly polymorphic genetic 
markers, one could Grace Uihcritance in exist- 
ing human pedigrees as if one had set up rhe 
crosses in the laboratory. These ideas soon led 
to an explosion of interest in rhe genetic 
mapping of rare human d jseases having simple 
Mendelian inheritance. More than 400 such 
diseases have been genetically mapped in this 
manner, and nearly 40 have been positionally 
cloned (5). 

Human geneticists are now beginning 
co explore a new genetic frontier, driven 
by an inconvenient reality: Most traits of 
medical relevance do not follow simple 
Mendelian monogenic inheritance. Such 
''complex** traits .include susceptibilities 
to heart disease, hypertension, diabetes, 
cancer, and infection. The genetic dissec- 
tion of complex traits is attracting many 
investigators with the promise of shed- 
ding light on old problems and is spawn- 
ing a variety of analytical methods- The 
emerging issues turn out to be relevant 
nor just to medical genetics, but to fun- 
damental studies of mammalian develop- 
ment and applied work in agricultural 
improvement. The field is still at an early 
stage, but It is ready co explode much as it 
has done in recent years with the analysis 
of simple traits- The purpose of this article 
is co synthesize the key challenges and 
methods, to highlight some enlightening 
examples, and to identify further needs. 

Complex Traits 

The term "complex trait" refers co any phe- 
nocypc chat does nor exhibit classic Men- 
delian recessive or dominant inheritance at- 
tributable to a single gen? locus. In gentral. 
complexities arise when the simple corrcspon* 
dence berween genotype and phenocype 



breaks down, either because the same geno- 
type can result in different phenotypes (due to 
the crTcca of chance, environment, or inter- 
acrions with other genes) or different geno- 
types can tesult in the same phenorype. 

To some extent, rhe category of complex 
traits is all-taclustvc* Even the simplest ge- 
netic disease is complex, when looked at 
closely. Sickle cell anemia is a classic ex- 
ample of a simple Mendelian recessive trait. 
Yet. individuals carrying identical alleles at 
the $«globin locus can show markedly dif- 
ferent clinical courses, ranging from early 
childhood mortality to a virtually unrecog- 
nized condition at age 50 (6). The trait of 
severe sickle cell anemia Is thus complex, 
being influenced by multiple genetic factors 
including a mapped X-linked bcus and an 
inferred autosomal locus that can increase 
feral hemoglobin amounts and thereby par- 
tially ameliorate the disease (7). 

U is often impossible to find a genetic 
marker that shows perfect congregation 
with a complex ctaic The reasons for this 
can be ascribed to a few basic problems- 
Incomplete penetrance and phenocopy. 
Some individuals who inherit a predispos- 
ing allele may not manifest the disease (in- 
complete penetrance), whereas others who 
inherit no predisposing allele may nonethe- 
less get the disease as a result of environ- 
mental or random causes (phenocopy). 
Thus, the genotype at a given locus may 
affect me probability of disease, but not 
fully determine the outcome.. The pen- 
etrance fuaaion'ilC). specifying the prob- 
ability of disease for each genotype C, may 
also depend on norujenetftT factors such as 
age, sex. environment, and other genes. For 
example, die risk of l>reast cancer by ages 
40, 55, and 80 Is 37%, 66%, and 85% m a 
woman carrying a mu ration at the BRCA1 
locus, as compared with 0-4%. 3%, and 8% , 
in a honcarner (8)- In such cases, genetic 
mapping is hampered by the fact that a 
predisposing allele may be present in some 
unaffected individuals or absent in some 
affected individuals. 

Genetic (or locus) heterogeneity. Muta- 
tions in any one of several genes may result 
in identical phenotypes, *uch as when the 
genes are required for a common biochem- 
ical pathway or cellular structure. This pos- 
es no problem in experimental organisms, 
because geneticists can arrange to work 
with pure-breeding sctains and perform 
crosses to assign mutations to complemen- 
tation classes. In contcasc, medical geneti- 
cists typically hai-e no way to know whether 
two patients suffer from the same disease for 
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different genetic reasons, at least until the 
genes are mapped. Examples of genetic het- 
erogeneity in humans include polycystic 
kidney disease (9), early-onset Alzheimer's 
disease (JO), maturity-onset diabetes of the 
young (Jl), hereditary nonpolyposis colon 
cancer (12). ataxia telangiectasia {13). and 
xeroderma pigmentosum { 14). Retinitis pig- 
raencc-sa, involving retinal degeneration, 
apparently can result from mutations in any 
of ac least 14 different loci (15), and Zell- 
weger syndrome, involving the failure of 
peroxisome biosynthesis, from mutations in 
any of 13 loci (16). Genetic heterogeneity 
hampers genetic mapping, because a chro- 
mosomal region may cosegregate with a dis- 
ease in some families but not in others. 
Genetic heterogeneity should be distin- 
guished from allelic heterogeneity, in which 
there are multiple disease-causing muta- 
tions at a single gene. Allelic heterogeneity 
tends not to interfere with gene mapping. 

Polygenic tahcricance- Some traits may re- 
quire the simultaneous presence of muta- 
tions in multifile genes. Polygenic nates may 
be classified as discrete traits, measured by a 
specific outcome (for example, develop- 
ment of type 1 diabetes or death from myo- 
cardial infarction) » or quantitative txaics, 
measured by a continuous variable (for ex- 
ample, diastolic blood pressure, fasting glu- 
cose concentrations, or immunoglobulin E 
(IgE) titers] whose level may be set by the 
combined action of individual quantitative 
trait lod. Discrete traits tray represent a 
threshold effect, produced whenever an un- 
derlying quantitative variable, influenced 
by multiple genes, rrfrmh a critical thresh- 
old, or a pure synthetic effect, req ulring the 
simultaneous and joint action of each of 
several mutations. 

Polygenic inheritance is easily demon- 
traced In animal crosses, in the transmis- 
sion partem of quantitative traits such as 
blood pressure (17), and in the pervasive 
fc "generic background" effects chat represent 
- die action of modifier genes. Foe' example, a 
mutation in the mouse Ape gene causes 
numerous intestinal neoplasias and early 
death in Be> mice but has barely noticeable 
effects when bred into an AKR stsain (18). 
More generally, the phenotypc of "knock- 
out mice" may vary dramatically on differ- 
ent strain background*, pointing to previ- 
ously unknown interacting genes. 

Polygenic inheritance is harder to dem- 
onstrate directly in humans, but it is surely 
no less common One form of retinitis pig- 
mentosa was shown to be due to strict di- 
genic inheritance, requiring the presence of 
heterozygous mutations at the peripheric 
KDS and ROM! genes {19) (whose encod- 
ed proteins are thought co interact in the 
photoreceptor outer segment disc menv 
bona). Some forms of Hirschsprung dis- 
appear to require the simultaneous 

203a 



presence of mutations on chromosomes 13. 
21, and possibly elsewhere (20). Polygenic 
inheritance complicates genetic mapping, 
because no single locus is strictly required to 
produce a discrete trait or a high value of a 
quantitative trait (except in the case of a 
pure synthetic interaction causing a discrete 
trait (21, 22)1 

HigK frequency q{ diseosc-eiusmg alleles. 
Even a simple trait can be hard to map if 
disease-causing alleles D occur at high fre- 
quency in the population. The expected 
Mendclian inheritance pattern of disease 
wiU be confounded by the problem that 
multiple Independent copies of D may be 
segregating in the pedigree [often referred 
co as bilineaiicy (23)] and that some indi- 
viduals may be homozygous for D (in which 
case one will not observe linkage between 
D and a specific allele at a nearby genetic 
marker, because either of the two homolo- 
gous chromosomes could be passed co an 
affected offspring (24)1. Late-onset Abhei- 
mers disease provides an excellent exam- 
ple. Initial linkage studies found weak evi- 
dence of linkage to chromosome 19q, but 
they were dismissed by many observers be- 
cause the lod score (logarithm of the like- 
lihood ratio for linkage) remained relatively 
low, and it was difficult to pinpoint the 
linkage with any precision (25). The con- 
fusion was finally resolved with the discov- 
ery that the apoUpoprotcin E type 4 allele 
appears to be the major causative factor on 
chromosome 19. The high frequency of the 
allele ("-16% In most populations) had in- 
terfered with the rjadlriorial linkage analy- 
sis (26). High frequency of disease-causing 
alleles becomes an even greater problem if 
genetic heterogeneity is also present. 

Otter transmission mecJtausmr. Finally, 
mammalian genetics has revealed additional 
modes of generic iruScritance* These include 
miaxhofvsnal inheritance (in which mito- 
chondria pass solely through the material 
germ line, arxieachrneiotic transmission may 
Involve selection from a potentially mixed 
pcrpularion of mutant and normal organelles); 
ifnpriruing (due to differencial activity of the* 
paternal and maternal copies of a gene); and 
phenomena due to the expansion of trinucle- 
otide repeats such as so-called * l anticiparion. M 
These modes of transmission pose little diffi- 
culty when they obey strict rules (as for im- 
printing), but they can complicate analysis 
when they lead to highly variable transmis- 
sion rates (as for some rrUrochordnal diseases 
or trinucleotide repeat diseases (27)] and m»y 
require specialized methods (28). 

Genetic Epidemiology 

Before undertaking DNA-based studies 
aimed at genetic dissection, one woutd idc - 
ally like co infer as much as possible about 
the genetic basis of a trait on the basis of 



the pattern of disease incidence in famines 
and populations. Such genetic cpidcmlolo- 
g£«onstiiutes a major-fieM in its own right 
for which excellent review* exist (29). We 
focus on a few key concepts. 

Tu**n studies- Whereas experimental ge- 
neticists can propagate inbred lines with iso- 
genic genetic constitution, the only opportu- 
nity to examine the e x pression of a human 
trait in o fixed generic background comes 
from the study of rrtortoxygotic (MZ) twins 
(30). The absolute risk to an MZ twin of an 
affected individual provides a direct estimate 
of penetrance for a given environment. 

Relative risk. The most important epi- 
demiological parameter (s the relative 
risk, X R , defined as the recurrence risk for 
a relative of an affected person divided by 
the risk for the general population. The 
subscript R denotes die type of relation; 
for example, \ Q and X s are the risks to 
offspring and sibs f respectively. The mag- 
nitude of K-a is related to the degree of 
concordant inheritance for genetic deter- 
minants in affected relative pairs and thus 
is related to the ease or difficulty of ge- 
netic mapping, as shown by Risch {31- 
33). Genetic mapping is much easiet for 
traits with high X. (for example, X* > 10) 
than for chose with low X (for example, X s 
< 2). As an illustration of the range, X s ~ 
500 for cystic fibrosis; 15 for type i dia- 
betes (of which a factor of 3 to 4 is 
attributable to concordance at the human 
leukocyte antigen (HLA) complex]; 8.6 
for schizophrenia; and 3.5 for type II dia- 
betes. For a quantitative phenorype, a 
limilar measure is the herttabllity of the 
trait (3<f). 

Segregation analysis. Segregation analy- 
sis* Wolves fitting a general model to the 
inheritance pattern of a trait in pedigrees. 
Using a model involving the presence of a 
simple Mcndelian factor in a background 
of multifactorial tnWeritance, one tries to 
estimate key parameters such as the allele 
frequency, penetrance, and proportion of 
casts explained by the Mendelian factor. 
An important example is the work of 
•Newman <t cl and other researchers (35, 
36) who showed that the degree of famil- 
ial clustering for breast eancer observed in 
1579 nuclear families wa* consistent with 
a domirumtty acting rare allele (frequency 
« 0.06%), accounting for 4% of affected 
women (but 20% of »f¥ccted moth or - 
daughter pairs), in a larger background of 
multifactorial causation. Segregation anal- 
ysis can be extremely sensitive to biases in 
the ascertainment of families Jfor exam- 
ple, if preferential inclusion of affected 
individuals may cause the penetrance to 
be grcarly overstated (37>] ( and ir may 
have little ability to distinguish among 
the many possible modes of inheritance 
for complex traits {38), Moreover, it can 
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he especially difficult CO estimate the 
nomher of distinct genes influencing a 
trait, cxeept in very favorable situations 
(39) . and to identify penetrance parame- 
ters associated with multiple loci (40). 

Defining Diseases 

Given the many problems that can hamper 
genetic dissection of complex traits, genet- 
icists try to icack che deck in cheir favor. By 
narrowing the definition of a disease or 
restricting che pauenc population, ic is often 
possible to work with a trait that is more 
nearly Mcndelian in its inheritance pattern 
and mote likely to be homogeneous. The 
extent to which redefinition simplifies chc 
cask of genetic mapping can be measured by 
the resulting increase in the relative risk X^. 
Although there is no guaranteed method to 
increase X R| four criteria are often useful. 

Cum«d phenotypc. For example, when 
colon cancer is restricted to cases with ex- 
treme polyposis, the trafc becomes a simple 
autosomal dominant one— which allowed 
positional eloning of the APC gene on 
chromosome 5 {41), Other forms of colon 
cancer can be distinguished by the photo- 
type of replication errors in tumors (42), In 
studying hypertension, one can increase X 
by focusing on cases with combined hyper- 
tension and hypcrfipidemia (43). 

Age « onset. Breast cancer and Abhei- 
racrV disease are tcrvdered genetically more 
homogeneous by focusing on early-onset 
cases {although the tarter can be caused by 
at least three independent loci (44)]. Sim- 
ilarly, the relative risk for death from heart 
attack is much greater for early-onset, cases 
(X$ — 7 in men and —IS in women under 
age 65) as compared with late-onset cases 

< 2) (45). 

Ftfiuly history. For example, the sister of 
> woman with breast cancer has. a much 
l£«acer risk if her mother is also 'affected 
(35. 36). Hereditary nonpolyposis colon 
cancer (12) was genetically mapped by de- 
fining che emit to require the presence of at 
least two other affected relatives. 

Severity. For continuous traits, it often 
pays to consider as affected only those in- 
dividuals at the extreme ends of che trait 
distribution. For example, one might select 
families for a hypertension study on the 
basis of the presence of at least one member 
with blood pressure exceeding 140/90- Such 
selection can gready Increase the ability to 
map genes, both in human families (46) and 
experimental crosses (47). 

Another way to improve che prospeecs 
for genetic dissection cs to focus on specific 
ethnic groups. Population genetic theory 
and data suggest that there will be greater 
genetic and allelic homogeneity in m more 
genetically isolated group (such « Sardin- 
ian*. Basques, Finns, and Japanese) than in 



a Urge, mixed popular ion (such as »s in New 
York Gtyior Los Angeles). Different ethnic 
groups may shed (ighron different aspects of 
a disease, which might be much harder to 
discern in an outbrcd population. For ex- 
ample, it has been suggested that there may 
be differences in the genetic etiology of type 
11 diabetes between Mexican Americans 
and Scandinavians, with somewhat higher 
frequency of early insulin resistance in the 
former and an early pancreatic beta cell 
defect in the latter (48). Focusing on a 
highly restricted population may also offer 
advantages for eventual positional cloning, 
because one may be able to exploit linkage 
disequilibrium for fine-structure genetic 
mapping (discussed below). 

Genetic Dissection: 
The Fourfold Way 

The methods available for genetic dissec- 
tion of complex traits fall neatly into four 
categories: linkage analysis, allele-sharing 
methods, association studies in human pop- 
ulations, and genetic analysis of large cross- 
es in model organisms such as the moose 
and rat. 

Linkage Analysis 

Linkage analysis involves proposing a mod- 
el to explain the inheritance partem of 
phenocypes and genotypes observed in a 
pedigree (Fig. l) f . It is the method of choice 
for simple Mcndelian traits because che al- 
lowable models arc few and easily tested. 
However; applications to complex traits can 
be more problematic, because it may be 
hard to find a precise mode! that adequately 
explains the inheritance pattern. 

Formally, linkage analysis consists of 
finding a model M r positing a specific lo- 
cation for a cratc~e&u$iftg gene, chac is much 
mote likely to have produced the observed 
data than a null hypothesis M^ positing no 
linkage to a trait-causing gene in the .region. 
The evidence for M t versus Mq is measured 
by the likelihood ratio. US 63 Prob 

Fig, 1. Linkage analysis invoices obstructing, a 
transmission model to explain me inheritance pf a 
disease in pedigrees. The model is straightfor- 
ward for simple Mendeten trails but can become 
very compticated for compiex traits. Unkage anal- 
ysis has been apptod to hundreds of srnpte Men- 
defian (rate, as wefl as to such situations as ge- 
natie haterogonerty ri bcaatt cancor and t*o- 
gene interactions in multiple sclerosis, 



(QafidlM^/Prob (Oa»|Mo), or. cquivalent- 
ly,hy the lod score. Z ~ log l0 (LR) (49, 50). 

The model M, is typically chosen from 
among a family of models M(<>), where 4 is 
a parameter vector that might specify such 
information as the location .of the trait- 
causing locus, the allele frequency at the 
trait and marker loci, die penetrance func- 
tion, and the transmission frequencies from 
parent to child. Many of these parameters 
may already be known (such as penetrance 
functions from prior segregation analysis or 
marker allele frequencies from population 
surveys)- The remaining, unknown param- 
eters arc chosen to be the maximum likeli- 
hood (ML) estimate, that is, die value 4> 
that makes the data most likely to have 
occurred (5/). The null model M<, corre- 
sponds to a specific null hypothesis about 
the parameters, 4> 0 . 

For example, the model for a simple 
Mendelian recessive or dominant disease 
is complerely specified except for the re- 
combination frequency 6 between che dis- 
ease gene and a marker; the null hypoth- 
esis of nonlinkage corresponds to 8 = 
50% recombination. ^ 

The ML model M(<X>) is accepted (com- 
pared with Mo) if the corresponding maxi- 
mum lod score Z is large, that is. exceeds a 
critical threshold T. Of course, a crucial 
issue is the appropriate significance thresh- 
old. The rxaditionat lod score threshold has 
been 3*0 (50, 52), although the appropri- 
ateness of this choice is discussed In the 
section on statistical significance. 

Applications. Linkage analysis is the cur- 
rent Jorkhotse of human genetic mapping, 
having been applied to hundreds of simple 
monogenic traits. Linkage analysis has also 
been successfully applied W genetically het- 
erogeneous traits in sotajfc cases. The sim- 
plest situation is whefi unequivocal linkage 
can be demonstrated in a single large ped- 
igree (with Z » 3), even though other 
families may show no linkage. This has 
been done for Arch diseases as adult poly- 
cystic kidney disease, early*onset Alzhei- 
mer's disease, and psoriasis (53). If linkage 
cannot be established on the basis of any 
single pedigree, one can ask whether a sub- 
sec of che pedigrees collectively shows evi- 
dence of linkage. Of course, one cannot 
simply choose those families with positive 
lod scores and exclude those with negative 
tod scores, as such an ex pose selection 
criterion wilt always produce a high post rive 
lod score. Instead, one must explicitly allow 
for generic heterogeneity within the linkage 
model (through chc inclusion of an admix- 
ture parameter o specifying chc proportion 
of linked families), although care is required 
because the resulting lod score Ku irregular 
statistical properties (54). Alternatively, 
families con be selected on the basis of a 
priori considerations. An example of this 
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approach is provided by the genetic map- 
ping of a gene for early-onset breast cancer 
(tWCAi) to chromosome I7q (55). Fami. 
lie* were added to the linkage analysis in 
order of tkeir average age of onset, resulting 
in a lod score that rose steadily to a peak 
of 2 = 6.0 wich the Inclusion of families 
witl\ onset before age 47 and then fell wich 
che addition of later-onset pedigrees. Not- 
wichsiaftdifAg these suocessc*. many felled 
linkage studies may result from cryptic Heter- 
ogeneity, h is always wise to cry to redefine 
craicj to make them more homogeneous- 
Linkage analysis can also be applied 
when penetrance is unknown. One ap- 
proach is co estimate the ML value of the 
penetrance p within the linkage analysis. A 
particular concern is to avoid incorrectly 
overestimating p, because this can lead to 
spurious evidence against linkage (caused 
by individuals who inherit a trait'causing 
allele but are unaffected). One can guard 
againsc this problem by performing an af- 
fccceds-only analysis, in which one records 
unaffected individuals as "phenocype un« 
known** or. equivalcntly, sea the pen- 
etrance artificially low (p «* 0). This ap- 
proach was important In studies of both 
early-onset and late-onset Alzheimer's dis- 
ease (25, 56). In the latter case, the lod 
score increased from 2.20 with an age-ad- 
justed penetrance function to 4 J8 with an 
affcctcds-only analysis, 

Some traits are so murky chat ic is un- 
clear who should be considered affected. 
Psychiatric disorder rail into this category, 
and investigators have explored using vari- 
ous alternative diagnostic schemes within 
their analysis. For example, schizophrenia 
might be defined strictly to include only 
patients meeting the Diagnostic and Statisti- 
cs Manual of Mental Disorders (DSM) cri- 
teria or be defined more loosely to include 
patients with so-called schizoid pcxsonality 
disorders (57). This approach is permissible 
jj* theory but requires great care in adjusting 
•the significance level to offect the effect of 
multiple hypothesis testing. 

Linkage analysis can also be extended co 
situations in which two or mote genes play 
a role in the inheritance of a disease, simply 
by examining the inheritance pattern of 
pairs of regions. Such an approach has been 
dubbed simultaneous search (21,58, 59). It 
can be applied to the situation of a genet i- 
caliy heterogeneous trait or to an interac- 
tion between two loci. Multiple sclerosis in 
large Finnish kindreds has been reported to 
be linked co che inheritance of boch HLA 
on chromosome 6 and che gene for myelin 
basic protein on chromosome 18, on the 
basis of such 9 two-loeu* analysis (60). 

Limitation!. Linkage analysis is subject to 
the same limitations as any model-based 
method. It can be very powerful provided 
that one specifies the correct model (61, 
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62). Use of the wrong model, however* can 
lead one to miss true linkages and some- 
times to accept false linkages (63, 64). In 
particular, exclusion mapping of regions can 
only demonstrate absence of a trait-causing 
locus fitting the particular model tested 
(50. 52). Finally, testing many models re- 
quires the use of a higher significance level, 
which may decrease the power to detect a 
gene; this Issue is discussed in the section on 
statistical significance. The more complex 
the trait, che harder it is in general to use 
linkage analysis (65). 

Computation. Calculating the likelihood 
ratio can be horrcndously complicated in 
some cases and requires computer programs 
(66\ 67). Elston and Stewart invented the 
first practical algorithm for calculating like- 
lihoods (6$, 69), which was implemented 
by Otc in the first general-purpose linkage 
program L1PED (70) and is also at the heart 
of the widely used LINKAGE package (71 ). 
However, che algorithm is not a complete 
panacea. In its original form it does not 
easily Accommodate environmental or poly- 
genic covariation among family members, 
which form the basis of so-called "mixed 
models** (67. 72) used widely in genetic 
epidemiology (73). In addition* it can be 
extremely slow for analysis with many ge-* 
netic markers or inbred families. Alterna- 
tive exact algontKms have been developed 
for some applications (74), including one 
that allows multipoint homorygosity map- 
ping (75), but chese tend to be limited co 
smaller pedigrees. Likelihoods can also be 
estimated by simulation-based methods, 
such as the Cibb's sampler and Monte 
Carlo Markov chains (76). There remain 
many important computational challenges 
in linkage analysis, 

Allete^Sharing Methods 

Allele-sharing methods arc not based on 
coTOcructirig a model, but rather on reject- 
ing a modeL Specifically, one tries to prove 
that che inheritance pattern of a cfaomo- 
soma! region is not consistent with random 
Mendclian segregation by showing that af- 
fected relatives inherit identical copies of 
the region more often than expected by 
chance (Fig. 2). Because allele-sharing 
methods are nanparamecric (rhat is, assume 
no model for the inheritance of the trait), 
they tend to be more robust than linkage 
analysis: affected relatives should show ex- 
cess allele sharing even in the presence of 
incomplete penetrance, phenocopy, genetic 
heterogeneity, and high-frequency disease 
alleles. The tradeoff is that allele-sharing 
methods are often less powerful than ft cor* 
rectly specified linkage model. 

v Allele-sharing methods involve studying 
affected relatives in a pedigree to see how 
often a particular copy of 9 chromosomal 
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region is shared identical .by-descent (IBD) 
chat is. Is inherited from a common ancestor 
wkfajn the pedigree. The frequency of IBD 
sharing at a locus can then be compared 
with random expectation. Formally, one 
can define an identicy-by -descent affected - 
pedigree-member (IBD-APM) statistic 

«(*) - 

where x, 7 (s) is the number of copies shared 
IBD at position * along a chromosome, and 
where the sum is taken over alt distinct pairs 
(ij) of affected relatives In a pedigree. The 
results from multiple families can be com- 
bined in a weighted sum T(s). Assuming ran- 
dom segregation. T(s) tends to a normal dis- 
tribution wich a mean u. and variance <j chat 
can be calculated on the basis of the kinship 
coefficients of the relatives compared (77, 
78). Deviation from random segregation is 
detected when the statistic (T - u.)Ar ex- 
ceeds a critical threshold (see the section on 
statistical significance). 

Sib pairs. Affected sib pair analysts is the 
simplest form of this method. For example, 
two slbs can show IBD sharing for zero, one. 
ot two copies of any locus (with a 25%- 
50%-25% distribution expected under ran- 
dom segregation). If both parents arc avail- 
able, che data can be partitioned Into sep- 
arate IBD sharing for the maternal and 
paternal ctvocnosorne (zero ot one copy, 
with a 50%-50% distribution expected un- 
der random segregation). In cither case, 
excess allele sharing can be measured wich a 
simple v/tcsc (79-81). 

Sib pait studies have played an impor- 
tant role in the study of type I diabetes. 
Excess allele sharing confirmed che impor- 
tant pale of HLA, although the inheritance 
pattern fit neither a simple dominant or 
recessive model (82, 83). With the avail, 
ability of a comprehensive human genetic 
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FtQ. 2. Atele-sharing methods involve testing 
wither effected reJatkes inherit a region identi- 
caJ-by-Oescsnt (IBOJ more often Ihan expected 
under random MendeJian segregation. Aflectad 
sib pair analysis is a weU-Known special case, in 
which the presence of a traii<ausing gene is re- 
veled by mora ihan rhe expected 50% ISO afltfc 
sharing. The method is more robust tor genetic 
compticaforts than fcnxage analysis but can be 
tess powerful than a correctly speofed Snkaoe 
mode<. Examples include applications to rype 
diabetes, essential hypertension, IgE levels, a™ 
bone density in postmenopausal women. 
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Jhkage map, sib pair analysis has been ap- 
tVied co a whole-genome scan, and excess 
'.tele sharing ha* been found ac a locus on 
aWnosome Hq, pointing to a previously 
unidentified /causal factor in type I diabetes 
(&)- In a similar search restricted co the X 
chromosome, brothers concordant for the 
trot of homosexual orientation showed Sig- 
nificant excess allele sharing (33 our. of 40 
\ casts 1; in the region XqZ8, suggesting the 
I involvement of a genetic factor influence! p 
at ,'least the particular subtype of homosc 
Mir-'iliry studied (85). The same approach c?m 
;|*nc applied to affected uncle-nephew pairs 
find aousin pairs, for example. 
\ MJD versus IBS. One often cannot tc ;| 
whether two relative* inherited a chromo- 
somal region IBD, but only whether they 
* have the same alleles ac genetic markers in 
) the region, that is, ate identical by state 
(IBS). It is usually safe to infer IBD from 
j IBS when a dense collection of highly poly- 
i morphk markers has been examined, but 
the early stages of genetic analysis may in- 
volve sparser mar* with less informative 
markers. Two approaches have been devel- 
oped to cope with this important praccical 
difficulty. The first amounts to inferring 
IBD sharing on the basis of the marker fat* 
(expected IBD-APM methods) (86), *hercas 
the second uses another statistic based ex- 
plicidy on IBS sharing (IBS-APM method) 
(78, 87). (The inventors of the latter meth- 
od dubbed it simply the APM method, but 
we prefer the more descriptive names used 
here ) Both approaches are important, al- 
though key statistical and computational 
issues remain open for each. 

A number of recent studies have applied 
IBS-APM methods to complex traits. The 
arigioteruinogen gene has been shown with 
IBS-APM analysis to be linked to essential 
hypertension in multiple* families, al- 
though the gene explains only a minority of 
(he phenotype (88). Similarly, linkage of 
late-onset Alzheimer's disease to chromo- 
some 19 could be established by IBS-APM, 
even though traditional lod score analysis 
gave more equivocal results (25), 

Quonbttuxue traits, Altetc-sharlng meth- 
ods can also be applied to quantitative 
traits. An approach proposed by Haseman 
and Elsron (89) is based on the notion that 
the phenoiypic similarity between two rel- 
atives should be correlated wirh the number 
of alleles shared at a crait-cauiing locus. 
Formally, one performs regression analysis 
of the squared difference A 2 in a trair be- 
tween two relatives and the number x of 
alleles shared IBD at a locus. The approach 
can be suitably generalized ro other rela- 
tives (90) and multivariate phenoeypes 
(9i). It ha? been used, for example, to relate 
serum IgE levels wich allele sharing in the 
region of the gene encoding tnrerfeukin*4 
and bone density in postmenopausal wom- 



en with allele sharing in the region of the 
vitamin D receptor (92, 93). In addition, 
there has been a resurgence of interest in 
the theoretical aspects of mapping genes 
wich IBD and IBS methods (94). 

APM methods have been applied to 
whole-genome searches only in a few cases, 
including a recent study on manic depres- 
sion (95)- This situation is certain to 
change in the near future. 

Association Studies 

Association studies do not concern familial 
inheritance patterns at all. Rather, they are 
case-control studies based on a comparison 
of unrelated affected and unaffected indi- 
viduals from a population (Fig. 3). An allele 
A ac a gene of interest is said to be associ- 
ated with the trait if It occurs at a signifi- 
cantly higher frequency among affected 
compared with control individuals. The sta- 
tistical analysis is simple, involving only a 
2X2 contingency table. The biggest poten- 
tial pitfall of association studies is in the 
choice of a control group (which is In sharp 
contrast to linkage and allele-sharlng meth- 
ods, which require no control group because 
they involve testing a specific model of 
random Mendel Ian segregation within a 
family). Although association studies can 
be performed for any random DNA poly- 
morphism, they are most meaningful when 
applied to functionally significant varia- 
tion* in genes having a cleat biological 
relation to the trait- 

Association studies have played a crucial 
role in implicating the HLA complex in the 
etiology of autoimmune diseases. The allele 
HU\-B27, for example, occurs in 90% of 
patients with ankylosing spondylitis bur 
only 9% of the general population (96). 
There are scores of HLA associations in- 
volving such diseases as type 1 diabetes, 
rheumatoid arthritis, multiple sclerosis, ce- 
liac disease, and systemic lupus eryrhroma- 
cosus (97). More recently, association stud- 
ies pUycd a key role in implicating the 



AasoolaUon studies 



■ " 1 



□ 



o o 

□ I □ o 

o □ o 



Fig. 3. Association studies test whether a par- 
ticular allele occurs at higher frequency among 
affected than unaffected individuals. Associa- 
tion siudiaa inus involve population correlation, 
rather than coseo/egatfon within a family. Ex- 
amples include HLA associations in many auto- 
immune diseases, apolipoprote'm £4 In Alzhei- 
mer's, and angiotenaion converting enzyme 
(ACE) in nean disease. 
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apolipoprotem E gene in both late-onset 
Alzheimer's disease and heart disease and 
the angiotensin converting enzyme (ACE) 
gene in myocardial infarction (98). In ad- 
dition, methods for assessing associations 
between marker loci and quantitative oaits 
have received recent attention (99): 7 ■7*;^. 

What does a positive association uoprf 
about a disease? On its own, very .tittle* 
Association* can arise for three reasons, one 
of which is completely amtactuaL 

1 ) Positive association can occur if al- 
lele A is actually a cause of the disease. In 
this ca^c, the same positive association 
would be expected to occur in all popula- 
tions (100), 

2) Positive association can also occur if 
allele A does not cause the trait bur is in 
linkage disequilibrium with the actual 
cause, that is, A tends to occur on those 
chromosomes that also cany a rxaic-causing 
mutation. Linkage disequilibrium will arise 
in a population when two conditions are 
met: most cases of the trait are due to 
relatively few distinct ancestral mutations 
at a trait-causing locus, and the marker 
allele A was present on one of these ances- 
tral chromosomes and ties close enough to 
the trait-causing locus chat the correlation 
has not yet been eroded by recombination 
during the population's history. Linkage dis- 
equilibrium is most likely to occur in a 
young, isolated population- 
True associations due to linkage disequi- 
librium can yield seemingly contradictory 
results. Because linkage disequilibrium de- 
pends on a population's history, a naie 
might show positive association with allele 
A I In one isolated population, with allele 
A 2 in second isolated population, and with 
no allele in a large, mixed population. 
Moreover, a trait may show no association 
wich an Eco RI restriction fragment length 
polymorphism (RFLP) in a gene but strong 
association with a nearby Bam HI £FLP, 
because of the particular population genetic 
features of a popularion (101), 

3) Most disturbingly, positive associa- 
tion can also arise as an artifact of popula- 
tion admixture. In a mixed population, any 
trait present at a higner frequency in an 
ethnic group will show positive association 
with any allele that also happens to be more 
common in that group. To give a light- 
hearted example, suppose that a would-be 
geneticist set out to study the "trait" of 
ability ro eat with, chopsticks in the San 
Francisco population by performing an as- 
sociation study wirh the HLA complex. 
The allele HlA-Ai would mm out to be 
positively associated with ability to use 
chopsticks — not because immunological 
determinants play any role in manual dex- 
rerity, but simply because the allele HLA- 
AI is more common among Asians than 
Caucasians. 
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This problem has afflicted many associ- 
ation studies performed in Inhomogcncous 
populations ranging from the populacion of 
metropolitan Los Angeles to Native Amer- 
ican tribes. A subtle example arose because 
Pima Amerindians are much more Suscep- 
tible than Caucasians to type II diabetes. 
Studies in the Pima showed association be- 
tween type II diabetes and the Cm locus, 
wich the "protective" allele being the one 
present ar higher frequency in Caucasians. 
Subsequent work, however, revealed that 
the association was apparently because tribe 
members have different degrees of Cauca- 
sian ancestry: The presence of a "Cauca- 
sian" allele at Any gene tends to correlate 
wttb « higher degree of Caucasian ancestry, 
which in turn tends to correlate with a 
lower risk of type II diabetes (102). 

To prevent spurious associations arising 
from admixture, a number of seep? should be 
taken. 

1) If possible, association studies should 
be performed within relatively homoge- 
neous populations. If an association can 
only be found in large, mixed populations 
but not in homogeneous groups, one should 
suspect admixture. 

2) Given the difficulty of selecting a 
control group that is perfectly matched for 
ethnic ancestry, association studies should 
use an "internal control" for allele frequen- 
cies: a study of affected individuals and their 
parents. If the parents have genotypes A J 
A 2 and Aj/A< and the affected individual 
Has genotype A,/A >t then che genotype AJ 
A* (consisting of oSe two alleles that the 
affected individual did not inherit) provides 
an "artificial control" that is well matched 
for ethnic ancestry. This method is some- 
times called the affected family-based con- 
trol or haplotype relative risk method and 
eart be applied either to rhe genotypes or to 
<he alleles (J03). In out opinion! tuch in- 

iemal controls should be routinely used. 

- Collecting parental DNA is useful for a 
second, unrelated reason. With knowledge 
of parental genotypes, one can construct 
multimarkct haplotypes (indicating rhe al- 
leles found on the same maternally or pa- 
ternally derived chromosome), which can 
be much more informative than studying 
single marken one at a time. This can be 
especially useful in isolated populations, 
where only a limited number of distinct 
trait-causing chromosomes may be present. 

3) Once a tentative association has 
been found, it should be subjected to a 
transmission disequilibrium test (TDT) 
(104, 105). The test has the premise that a 
parent heteioiygous for an associated allele 
A, and a nonassociated allele A 2 should 
more often transmit A t than A z to an 
affected child. The TDT was ft ret applied to 
the pooling situation of the insulin gene, 
»*hich showed strong association but no 



linkage to rype 1 diabetes; linkage had 
been obscured because of the substantial 
proportion of homozygous (and thus non* 
segregating) parents ((04). It should be 
noted that TOT cannot be directly ap- 
plied co the sample in which initial asso- 
ciation was found (because affected indi- 
viduals necessarily have an excess of the 
associated allele) but rather to a new 
sample from the same population. 

The controversy over a reported associ- 
ation between alcoholism and an allele at 
the dopamine D2 receptor (DRDZ) illus- 
trates all the issues in association studies- 
The initial study compared postmortem 
samples from 35 alcoholic* and 35 controls, 
with no attempt to control for ethnic an- 
cestry (other than race) (106). For a Taq I 
RFLP located about 10 kb downstream from 
DRDZ, the Al allele was found to be 
present in 69% of alcoholics and 27% of 
controls. Attempts to replicate this finding, 
however, have yielded conflicting results, 
with some authors finding no association 
whatsoever and others reporting association, 
for severe alcoholism only 007)* Reveal- 
ingly, the frequency of the polymorphism 
has been shown co vary substsntially among 
populations and among the various "con- 
trol" groups used. In light of this variation, 
ic is imperative chac studies use internal 
control genotypes, although this has not 
been done to date. Association studies in 
relatively homogeneous populations, link- 
age studies, and transmission tests have all 
been negative (108). Ac present, there is no 
compelling evidence that the repotted as- 
sociation is not an artifact of admixture- 
Association studies are not well suited co 
whole-genome searches in luge, mixed 
populations. Because linkage disequilibrium 
extends over very short distances in an old 
population (109), one would need tens of 
thousands of genetic markers to "cover" the 
genome. Moreover, testing many markers 
raises a serious problem of multiple hypoth- 
esis testing; each association rest is nearly 
independent, Testing n loci each with k 
alleles amounts co performing about h(le - 
1 ) independent tests, and the required sig» 
niftcance level should be divided by this 
factor. A nominal significance level of P 
0.0001 « thus needed simply to achieve an 
overall false positive rate of 5%, if one tests 
1 00 markers with fix alleles each. (Some 
authors propose to avoid this problem by 
identifying all results significant at the P - 
0.05 level in an initial sample and then 
attempting to replicate them in a second 
sample (J 10)! However, the same multiple 
testing issue still applies to retesting many 
results at the second stage.) Genomic 
search for association may be more favor- 
able in young, genetically isolated popula- 
tions because linkage disequilibrium ex- 
tends over greater distances, and the num- 



ber of disease-causing alleles is likely ro bc^ 
fewcwUf. MI). 

rrflumrrtary, linkage-type studies and as- 
sociation studies have many crucial differ- 
ences- Association studies test whether a 
disease and an allele show correlated occur- 
rence in a population, whereas linkage stud- 
ies test whether they show correlated trans* 
mission within a pedigree. Association *tud- 
ies focus on population frequencies, whereas 
linkage studies focus on concordant inher* 
itance. One may be able to detect linkage 
without association (for example, when 
there are many independent trait-causing 
chromosomes in a population, so that asso- 
ciation with any particular allele is weak) or 
association without linkage (for example, 
when an allele explains only a minor pro- 
portion of the variance for a trait, so that 
the allele may occur more often in affected 
individuals but docs a poor Job of predicting 
disease status within a pedigree). Linkage 
and association are often used interchange- 
ably in popular articles about genetics, but 
this practice should always be avoided 

Experimental Crosses: Mapping 
Polygenic Traits, Including QTls 

Experimental crosses of mice and rats of- 
fer an ideal setting for generic dissection 
of mammalian physiology (Fig. 4). With 
rhe opportunity to study hundreds of rrtei- 
oses from a single set of parents, the prob- 
lem of genetic heterogeneity disappears, 
and far more complex genetic interac- 
tions can be probed than is possible for 
human families. Animal studies are thus 
an exiremcly powerful tool for extending 
the tpach of genetic analysis. Of course, 
animal studies must always be evaluated 
for their applicability to the study of hu- 
man diseases. Because disease-causing 
mutations may occur at many steps in a 
pathway, animal models may not point to 
those genes most frequently mutated in hu- 
man disease. However, animal studies 
should identify key genes acting in the same 
biochemical pathway or physiological sys- 
tem- Animal models that are poor models 
for pharmacologists seeking to evaluate a 
new human drug therapy may nonetheless 
be excel tene models for generic is ts seelcing 
to elucidate the possible molecular mccha* 
nisms or pathways affected in a disease. 

The power of experimental crosses «s 
most dramatically seen in the ability to 
dissect quantitative traits into discrete ge- 
netic factors \ 1 12). Systematic quantitative 
trait locus (QTL) mapping has only rectn"- 
ly become possible wirh chc construction oj 
dense genetic linkage maps for mouse and 
rat (18. I (3, I i4) and the development of 
a suitable analytic*! approach for a whole - 
genome search, known as interval mapp^n 
Interval mapping uses phenocypic and gc- 
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netic marker information to estimate the 
probable genotype and the most. likely QTL 
effcet at every point in the genome, by 
mearu of a maximum -I ilu: I i hood linkage 
anary*is. The baste method was introduced 
by Lander and Botsrein for a simple situa- 
tion (47) but has teen generalized to a wide 
variety of setting? (59, JI5, 116). In gen- 
eral QTL mapping is much more powerful 
in experimental crosses than in human fam- 
ilies because of the fundamental differences 
in the statistical comparisons involved 
0 1 7) and because nongenetic noise can be 
decreased through the use of progeny tests, 
recombinant inbred strains, and recombi* 
rtant congenic strains (47, I IB). 

Genome-wide QTL analysis was first ap- 
plied to fruit characteristics in the tomato 
( J 1 9>, but it way soon used in mammals to 
study epilepsy ta mice and Hypertension in 
rats (iJ3, ] 20). In the hrxer case, the ani- 
mal study rapidly stimulated parallel human 
studies, with the reported linkage of the 
ACE gene to hypertension in rats provok- 
ing investigation of various genes In the 
pathway and leading to the implication of 
angiotensinogen in essential hypertension 
in humans. In only a short time, there has 
been an explosion of interest in QTL map- 
ping in both agriculture and biomedicine 
(121). The approach opens the way to un- 
derstanding the genetic basis fot the .tre- 
mendous strain variations seen in such 
quantitative traits as cancer susceptibility, 
drug sensitivity, resistance to infection, and 
aggressive behavior ( 1 22). The most impor- 
tant application of QTL mapping may rum 
out to be the identification of modifier 
genes affecting single-gene traits. Yeast ge- 
neticists routinely use suppressor analysis to 
study a mutant gene by isolating secondary 
mutations capable of modifying the original 
mutant phenotype. Although mammalian 
geneticists cannot easily use mutagenesis to 
find suppressors, they may be able to ac- 
complish the same goal by breeding muta- 
tions onto different generic backgrounds 
and dissecting the QTLs that affect-' the 
phenotypic expression. A first such exam- 
pie is the finding that intestinal neoplasias 
induced by mutations in the mouse Apt 
gene can be dramatically influenced by a 
modifier locus on chromosome 4 (18). By 



applying this approach to the ever-growing 
list of gene knockouts, it should be possible 
to identity many additional interacting 
genes. 

Experimental crosses also facilitate anal- 
ysis of discrete tralrs with complex genetic 
etiology. Studies of type I diabetes in the 
nonobese diabetic mouse report the map- 
ping of a doien loci, each making a partial 
contribution to a threshold trait ( 123)- 
Anslysis of type I diabetes in trie BB rat 
points ro a purely synthetic interaction with 
one, two, or three genes required to produce 
. disease, depending on the particular cross 
(124). 

After initial mapping, experimental ge- 
neticists can study the physiological effects 
of individual polygenic factors by construct- 
ing congenic strains that differ only in the 
region of a single locus. Genes may also be 
mapped more finely by systematically whit- 
tling away ac the sue of the congenic inter- 
val. In some cases, syntcny conservation in 
gene order between different mammals may 
point to interesting regions co investigate in 
the human genome. 

An important point about the use of 
experimental crosses deserves to be empha- 
sized, because it is commonly misunder- 
stood- Genetic mapping results need not be 
consistent among different crosses. Linkage 
analysis reveals only those trait-causing 
genes that differ between the two parental 
strains used. A QTL may thus be detected 
in an A X B cross, but not in an A X C 
cross. Moreover, the effect of a QTL allele 
may change— or even disappear— when 
bred onto a different genetic background, 
because of episcatic effects of other genes. 

Statistical Significance 

One of the thorniest problems in the genet- 
ic analysts of complex traits is to know 
whether a result is statistically significant. 
Psychiatric genetics has confronted this is- 
sue most squarely, as reported linkages to 
manic depression or schizophrenia have 
typically failed to withstand close scrutiny 
or replication (57, 125). Statistical signifi- 
cance iu a challenging problem because ge- 
netic analysis can involve two types of fish- 
ing expeditions: testing many chromosomal 
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flg. 4. Experimental crosses can provide a large number of progany while ensuring genetic homogeneity. 
As a result, experimental crosses permit the genetic dissection of mors complex genetic interactions than 
rJirecrfy possible In human families, such as mapping of OTLs. Examples include epilepsy in mice, 
hypertension in rats, type I diabetes in mice and rate, and susceptibility to intestinal cancer in mfce. 
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regions across a genome and testing multi- 
ple models for inheritance. 

For example, human geneticists have 
long used the convention that a lod score 
exceeding 3 should be required to declare 
Linkage to a simple Mendelian trait. The 
threshold was based on a Bayesian argu- . 
ment involving the prior probability of 
finding a gene and aimed ro yield a false . 
positive rate of 5%. Unfortunately, the tea- ,- 
sorting does not extend well to the modern 
world of complex traits (with no clear prior 
hypothesis) or dense maps (with thousands 
of markers that can be tested). Instead, two 
approaches have gained favor in recent 
years. 

Arvdytkal m*rWs- Formally speaking, 
genetic dissection involves calculating a 
statistic X throughout a genome. The issue 
of statistical significance consists of choos- 
ing an appropriate threshold T for declaring 
the presence of a gene t such chat the gc- 
nccne-wide false positive rate, Prob (X > 
T), is small under the null hypothesis that 
no gene is present. In some cases, the ge- 
nome-wide false positive rate can be esti- 
mated on the basis of simple and elegant 
rnarhematicai formulas. The unifying idea 
comes from the insight (47, 126) that many 
linkage statistics tend to an asymptotic null 
distribution that is closely related to a well- 
known physical process called rhe Om- 
sceovUrienbeck diffusion (which describes 
the velocity of a particle undergoing one- 
dimensional Brownian motion). The prob* 
letn of random large excursions of such, 
diffusions has been extensively studied and 
applies directly to genetic analysis. The ge- 
nome-wide false positive rate, c^* — Prob 
(X > T somewhere in the genome), can be 
related to the nominal false positive rate, 
* Prob ( X > T at a single poinO. by the 
formula 

where C Is the number of chromosomes, G 
is generic length of the genome in Morgans, 
and the constant p and rhe function K(T) 
are defined in the notes (i 27). Solving a r * 
» 0.05 yields the appropriate threshold T. 
As confirmed by simulation studies, the 
estimates apply well-to many basic situa- 
tions (47, 128). Appropriate thresholds for 
various .settings ate shown in Table I. For 
traditional human linkage analysis, the ap- 
propriate aaympcotic lod score threshold for 
a 5% significance level is about 3 J. The 
traditional threshold of 3 actually yields a 
gcrvomc-widc false positive rate of about 
9%. Note that all of rhe thresholds corre- 
spond to nominal P values less than 10 - <; 
this t* considerably more stringent than the 
level of 10 _i applied by many authors. 

The problem of searching over alterna- 
tive models has received format attention in 
only a few cases (61 ). Current practice is ro 

3043 
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consider that each of the k model* exam- 
ined .are statistically independent. The tra- 
ditional Bonferroni correction prescribes 
multiplying the significance level by k or, 
equivalencly. increasing the required lod 
score threshold by about log lo (M (f29). 
The approach wili likely be too conserva- 
tive If the models are dependent. 

Simulation ttudies- Unfortunately . the 
analytical approach depend* on key as- 
rumptions {such as the normality Q ( an 
underlying statistic and the pooling of many 
independent mcioses) which will often be 
raise in important situations, f or example, 
affecced^xdigrcc-member (AM) analysis 
of & modest number of large pedigrees. The 
best approach in such cases is to directly 
estimate the raise positive rate by simula- 
tion. In most settings, one can randomly 
generate the inheritance partem of genetic 
markers in a pccigr.ee according to the laws 
of Mendeiian inheritance and then recalcu- 
late the value of the statistic X for each such 
replicate {61.1 30). In some settings, one can 
apply permutation tests such as scrambling 
the phenotypc* or genotypes in & sib pair or 
QTI- analysis (131), Slmulation-based tests 
have received a great deal of attention in 
seadsda in general (132) and are very appro- 
priate for many genetic analyse* settings (61 , 
130, ill). They have been applied to the 



problem of genome wide search and model 
selection {61). We strongly advocate this ap- 
proach, aldiough btoad use wilt require in- 
creased dissemination of computer programs 
for simulation analysis. 

A final issue should be noted. The ap- 
propriate thresholds fot whole^genome 
searched should always be applied to any 
new hypothesis, even if one only searches 
over a small subset of the genome. The 
reason is that traits of interest will typically 
be studied by multiple investigators, but 
only positive results will be published. The 
genetics community as a whole is thus con- 
ducting a whole-genome scan, and the fill 
multiple testing threshold should be applied 
to any positive result- Some authors have 
suggested avoiding this problem by devel- 
oping hypotheses in one data set and rotes t- 
ing them in another (/33). Thiy can be 
helpful, but one must still apply a correction 
If one expects to retest multiple hypotheses 
at the second stagc- 

Experimental Design 

In designing a generic dissection, two cru- 
cial choices arise: (i) the number and type • 
of families from which/ to collect data and 
(it) the number and type of genetic markers 
to use. To make these choices, one needs to 



l^^CS^^ Xt ^^ com *? on ^ ■» a 9*vyr^wide significance level of S% . The human 

^^Z^^J^^ 1 ^^ * a ****** and to tf* most 

comrw types of affected-ftfaw* analysis. For sto pate, (wo cases are considered- <3 rarems 

^I^^^^LfS^f ^Sf f^rameter te estimated Clor «^ample, «n acicfltive effecg or tvvo parame^Tam 
wt^(fc<axan^ 

ttS^T^^S^^^^ toaterse number of rneioads. Equivtfcnt tttesWds 
are tfvenirtter^ of tod scores: normal scores trad ki atele-snartno methods and QTL maooinq 

'Soffit** ^J2?£ WP** asymptotte threshok* to <fer«*d by a«^^£s 

t^T^f^^ilii ** 0Qlafc ' ^ assum « 3 Oenomq size is 3300 cM f or the human and WcM 
for the rrw^. 7r« sCcy>tV larger 
nofmal-varaat^ trveshK*te by 



Application 



Standard linkage analysis, one free 

parameter 
Atete snaring: Grandpewc- 

QrandcrAj pairs 
Atete sharing: Hafl^siD or aiD pairs 

Aitete staring: Sib pairs (parents 

untyped) 
m &q sharing: Uncic-nephew or 

finjt-cousln pairs 



Backcross or intercross: 1 degree o< 

freedom 
imercroas: 2 degrees of freedom 
ftecxxnbinem inbred tines: \ degree 

of freedom 
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Normal score 
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Icaowjche statistical power to detect a gene 
as a fjSnctlon of these choices. 

For a simple Mendeiian monogenic trail, 
a basic rule of thumb suffices: With a ge- 
neric map containing highly polymorphic 
markers every 20 ccntimorgaru, linkage can 
be easily detected with about 40 informa- 
tive mciosca (21, More generally, the 
power to detect linkage depends essentially 
on the number of informative tneioses, al- 
most regardless of family structure. Power 
can be approximated simply by counting 
informative mctoses and can be more pre- 
cisely estimated with simulation-based 
computer package* such as SIMUNK and 
SUNK (1351 

In contrast, there is no comparable pre* 
scripcion for a complex trait The optimal 
experimental design depends on die precise 
derails of the genetic complexities, informa- 
tion which is typically not known in. ad- 
vance. The best compromise is to design a 
study co have sufficient power to detect any 
genes with effects exceeding a given mag- 
nitude. For example, one can calculate the 
number of sib pairs required to use allele- 
ihartng methods to detect a locus that in- 
creases ehe relative risk to siblings by at 
least twofold (32, 82, 136). However; even 
if che overall relative risk ro siblings is large, 
there is no guarantee that there exists any 
individual locus having an effect of this 
magnitude. Similarly, one can calculate the 
number of progeny needed to detect a QTL 
accounting for 10% of the phcriotypic vari- 
ance of a crait, but predicting whether any 
such loci will be present U possible only 
under very favorable circumstaoccs (157). 
Genetic analyses of complex traits should 
alwa/s explicitly report the minimum effect 
chat cOutrJ have been reliably- detected giv- 
en the subjects studied. 

The optimal choice pfwhich families or 
crosses to study may also vary with the 
circumstances. For human studies, the 
range of choices include whether to focus 
on individuals with extreme phenotypes, 
when to extend a pedigree, and whether to 
prefer or to exclude families with coo many 
affected individuals (137). For animal stud- 
ies, the issues include whether to set up a 
backcross or intercross and whether to con* 
centra te on the progeny with the most ex* 
trcme phenocypes (47, /3S). 

The optimal density of genetic markers 
is a topic requiring more attention. The 
effect of polymorphism rate on the power of 
allelc*sharing methods has been studied for 
single markers (33, 95, |36, 139), but not 
for the more realistic situation of multipoint 
mapping. U is clear that denser maps arc 
needed for the study of sib pairs without 
available patents or for the study of more 
distant relatives, but quantitative guidance 
is lacking. The effect of marker density on 
experimental crosses has been more exten- 
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5ivdy studied (47, 140). Finally, a few au- 
thor* have began to explore two-tiered 
strategics, in which Initial evidence is ob- 
tained with a sparse map and then con- 
firm id with a dense map (141). 

Cloning Genes That Underlie 
Complex Traits 

Once getvetic dissection implicates a chro- 
mosomal region, there remains the formida- 
ble task of idenvifybig the responsible gene. 
Thar type 1 diabetes congregate* with 
anonymous markets on chromosome ) lq in 
the human or that hypertension congre- 
gates with the ACE gene in rat crosses 
simply indicates that a causative gene lies 
somewhere nearby. However, the possible 
region might be as large as 10 to 20 Mb— 
enough to contain 500 genes. Positional 
cloning requires higher resolution mapping 
to narrow the search to a tractable region. 

For a simple Mendelian rxait, the situa- 
tion U most favorable. Because the respon- 
sible gene must show perfect cosegregarion 
with the trait, even a single crossover suf- 
fices to eliminate a region from consider- 
ation. From a study of 200 lactases, the 
interval can be pared to about I cM, corre- 
sponding to about I Mb (142). Still, the 
challenge is considerable. It is sobering to 
note that virtually all successful positional 
cloning efforts have depended on the fortu- 
itous presence of chromosomal abberrv 
tions, trinucleotide repeat expansions, or 
previously known candidate genes. Only 
two human disease genes have been posi- 
tional^ cloned solely on the basis of point 
mutations: cystic fibrosis and dystrophic 
dysplasia (DTD) (143). 

For complex traits, positional cloning 
wilt likely be even harder. Because aggre- 
gation is not expected to be perfect, single 
crossovers no longer suffice for f rne^srnjc- 
ture mapping. Resolution becomes a statis- 
tical matter (144). For a gene conferring a 
relative risk of twofold, for example, one 
would need to examine a median number of 
nearly 600 sib pairs to narrow the likely 
region (95% confidence interval) to 1 cM. 
Moreover, the genes underlying complex 
traits may be subtle missense mutations 
rather than gross deletions. How will posi- 
tional doners overcome these obstacles? 

In the human* the most powerful strate- 
gy may prove to be linkage disequilibrium 
mapping in genetically isolated population* 
(2 J , MS). The idea is to find many affected 
individuals who have inherited the same 
disease-causing allele from a common an- 
cestor. Such individuals will tend to have 
retained the particular partem of alleles 
present on the ancestral chromosome, with 
the immediate vicinity of the gene being 
evident as the tegion of maxima] retention. 
In effect, the method exploits information 



from many historical tneiotes and thereby 
affords much higher recombinatiortal reso- 
lution. Finc-structvre linkage disequilibri- 
um mapping has been applied to the isolat- 
ed Finnish population (founded about 100 
generations ago) to permit the cloning of 
the DTD gene (143). Whereas convention- 
al recombinational mapping was only able 
to localize the gene to within about 1.5 cM, 
linkage disequilibrium studies were able to 
pinpoint it to within abc*** 50 kb. The 
approach is also applicable . ,ounger pop- 
ulations: linkage disequilibrium should be 
detectable over larger distances, although 
the ultimate resolving power will be less 
(146), Elegant studies in .he Mennonitc 
population (rounded about 10 generations 
ago) have allowed initial mapping of genes 
involved in a recessive form of Hirsch- 
sprung disease (20). 

tn animal models, fine-structure map- 
ping of factors such as QTLs can be accom- 
plished through appropriate breeding. The 
key is to ensure unambiguous genotyping at 
the trait-causing locus. The best solution is 
probably to (i) creare congenic strains dif- 
fering only in the region of interest, (ii) 
cross these strains to construct recombinant 
chromosomes (that is, ones In which there 
has been a crossover between flanking ge- 
netic markets), and (Hi) evaluate each re* 
combinant chromosome to determine 
which crait-causing allele is carried by per- 
forming progeny testing (that is, examining 
the phenotype of many progeny carrying 
the chromosome) ( I J 3)- The construction 
of the required congenic strains would tra- 
ditionally require 20 generations of breed- 
ing. With the advent q( complete generic 
linkage maps, however, one can construct 
"speed congenicV" in only three to four 
generations by using marker-directed breed- , 
ing (1471 

The Human Genome Project promises 
to make a tremendous concribution to the 
positional cloning of complex traits by 
eventually providing a complete catalog of 
all genes in a relevant region. With such 
trrformation* positional cloning will be re- 
duced to the systematic evaluation of can- 
didate genes — trill challenging, but far more 
manageable than today's more haphazard tor- 
ays. Indeed, the Human Genome Project is 
essential if the generic analysis of complex 
traits is to achieve its full potential 

Finally, candidate genes, whether iden- 
tified by positional cloning or guessed a 
priori, must always be subjected to rigorous 
evaluation before they are accepted. The 
gold-standard tests for human genes should 
include association studies demonstrating a 
clear correlation between functionally rele- 
vant allelic variations and the risk of disease 
in humans, and transgenic studies demon- 
strating that gene addition or gene knock- 
out in animals produces a phenorypic effect. 
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For genes identified from experimental an- 1 
imal crosses, one can and should go a step 
further by demonstrating that an induced 
knockout allele at the candidate gene rails 
to complement an allele at the locus to be 
cloned (148). 

Conclusion 

In the early 1900s, the fledgling theory of 
Mendelian genetics was attacked on the 
grounds chat the simple, discrete inheri- 
tance patterns of pea shape or Drosophila 
eye color did not apply to the variation 
typically seen in nature (149). After 20 
yean of acrimonious battle* the issue was 
eventually resolved with the theoretical un- 
derstanding that Mendelian factors could 
give rise to complex and continuous traits, 
even if direct identification of the genes 
themselves was not practical Now, with 
the advent of dense genetic linkage maps, 
geneticists are raking up the challenge of 
the genetic dissection of complex traits. If 
they are successful, the tools of genetics will 
be brought to bear on some of the most 
important problems in human health and in 
agriculture, and the Mendelian revolution 
will finally be complete. 
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mefive metoees. Q follows trMe%lh«trTeexp«Cv»d 
distance to the nearest Bereino twsowonflttiar 
Side is (1/AO CM. (See eteO, rC Langs. L Kunksf, J. 
AJdridge. s. A. Latl, Am. JL Hum. Genet- 37. 653 
09S% M. Bcennke. &d 55. 379 0994).j 

143. j.Hastbeckaefaf..Cef7a. 1073 0964). 

1 4 a. N. Rsch. Am. J. «um. Ganef. 53 (stopfj. s 05 ^* 0 
186 0993): L Knjglyak and E. S. Landry, submit- 
ted. 

145. B.KVemefa/..5a(B«c»24S.l0730989):J.Hesf- 
backa et ej.. Nat Genet 2, 204 (1992K 

1^6- m a popttetton founded N genemtons bOO. fenfcage 
dcsequiHbrtjm ahouid be derecteble over otstances 
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on du O*0*r<* (100*4 oM. By sludyfc^ fcaflecteO* 
tfremo co m at . one shotid be abte to tocafiza a 
dfeease gone to a region en In* order of (tOOffrfV) 
cm. exact numbers «outf depend on the oredse 

loeaszaion of a cfecaae gana. but these estimates 
reflect (h© scafing «4h pqpuHjon age and number 
of affected crvcmceomes.) 
147. Traditional construction of conoenlc strains by 
repeat*) bacfccrotslno rates on ihe fad mat an 
overage of 50% of the undestod genome test 
at each generation. By using a compteie genetic 
tnfc a0C map. however, one can kfentfy ihose 
backemss progeny that have fbrtuftousV losl a 
targor proportion of tho wyJesired genome and 
breed them to create the ne* generation. In on*y 
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three to four generations, It is poss&c to **ni- 
nate easantialy all of the undesired genome. For 
example, this has been performed lo construct 
congenic strains for ine Mom-i region of mouse 
chfornosome 4 (A- Mow ana W. F. Owrich, 
personal communication). 
!4& Camplerrwnlatlon lasts can be performed only 
behveen tvro atfetes Causing the same recessive 
phenocype. Accordingly, knockout experiments 
should target an alleteA, that causes a dominant 
(or pertia&y dominant pnaootyoe when placed in 
trans lo a second ©Bete A,; trva knockout alte<c 
wooid tnen be expected to fad to ytekJ the dom- 
inant pneAorype Jn the ccmptemem&iion lost. 
Because currant gene knockout protocols arc 
fimctcd la a few mouse strains such as 129. one 



may Crai need to construcl « csongonfc cam**, 
ttedaiired m auoh a au*A before onem 
construct the appropriate Itnookout, 
149. Fore>iraic*tth*<*c^ 

Ot MehdoGan theory on the grouno« tr*t * carwot 
«^^v^M«ono6E»VBd'nrvaU«.fi<«vv.e Pro- 
vine. The Or^o/7r»eorc(<^Pbpu<afJbn riyw. 
fc$ (Urw. of Chicago ttcss, Chicago. I07t) 
i$0. We mar* L. Kruytytt and 0. Slagmuod for assis- 
tance oonooming thresholds far stonfficanceandC 
Amos. M. eoehnko, ACtakravarU. F. Cosing p/ 
Baton, W. FranteL D. Fufeer. S. Ghosh. S.-W. Cvo 
H. Jeeob. J. OK. A Weder. A Lyw and members' 
of the Lander laboratory far hefpfuf ccmmerMs on 
inc manuscript. This work was jajpeonod in pen by 
a grant fcem NJH (HG00098 to E.S.L.), 



AAAS-Newcomb Cleveland Prize 



To Be Awarded for a Report, Research Article, or 
an Article Published in Science? 



The AAAS-Newcomb Cleveland Prize is awarded 
to the author of an outstanding paper published in 
Science. The value of the prize is $5000; the winner 
also receives a bronze medal. The current com- 
petition period began with the 3 June 1994 issue and 
cads with the issue of 26 May 1995, 

Reports. Research Articles, and Articles that in- 
clude original research data, theories, or syntheses 
and ane fundamental contributions to basic knowl- 
edge or technical achievements of far-reaching con- 
sequence are eligible for consideration for the prize. 
The paper must be a first-time publication of the au- 
«Ws own work. Reference to pertinent earlier work 
by the author may be included to give perspective. 

Throughout the competition period, readers are 



invited to nominate papers appearing in the Reports, 
Research Articles, or Articles sections, dominations 
must be typed, and the following information pro- 
vided: the title" Of the paper, issue in which it was 
published, author's name, and a brief statement of 
justification for nomination. Nominations should be 
submitted to the AAAS-Newcomb Cleveland Prize, 
AAAS, Room 924, 1333 H Street, NW, Washington, 
DC 20005, and must be received on or before 30 
June 1995. Pinal selection will rest with a panel of 
distinguished scientists appointed by the editor-in- 
chief of Science. 

The award will be presented at the 1996 AAAS 
annual meeting. In cases of multiple authorship, 
the prize will be divided equally between or among 
the authors. 
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